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ADDENDUM 

The  following  remark  should  be  added  on  page  44:- 

Enquiries  on  the  purchase  or  use  of  NPL  routines  for  Function  Minimisation 
should  be  addressed  to  NPL. 
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SUMMARY 

Formulae  for  updating  matrices  in  connexion  with  the  quasi-Newton  iteration 
are  derived  so  as  to  emphasise  the  principles  involved.  Computational  aspects 
are  discussed  and  two  FORTRAN  programs  for  nonlinear  minimisation  subject  to 
bounds  on  the  variables  are  described  and  their  use  of  finite  difference 
derivatives,  treatment  of  bounds,  line  searches  and  post-optimal  sensitivity 
facilities  are  compared  to  demonstrate  the  manner  in  which  the  subject  has  pro- 
gressed in  recent  years.  Brief  user  guides  to  the  two  programs  are  contained  in 
Appendices . 
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INTRODUCTION 


Many  engineering  problems  can  be  expressed  mathematically  as  the  optimisa- 
tion (meaning  maximisation  or  minimisation)  of  a function  of  several  variables, 
usually  subject  to  constraints  which  may  range  in  complexity  from  simple  upper 
and/or  lower  bounds  on  the  variables  to  highly  nonlinear  functions  whose  partial 
derivatives  (with  respect  to  the  variables)  cannot  be  obtained  explicitly. 

Applied  Mathematics  Division  at  RAE  has  followed  developments  in  nonlinear 
optimisation,  largely  through  Piggott,  ever  since  the  subject  began  to  gather 
momentum  in  the  mid-1960s.  A success  ion  of  very  useful  computer  programs  was 
produced  implementing  the  leading  methods.  The  present  work  is  a result  of  our 
continuing  interest  and  reflects  the  progress  made  in  unconstrained  minimisation 
since  the  last  Report  (Piggott* ). 

The  program  M21UNC  which  we  shall  be  describing  hopefully  maintains  the 
standard  of  reliability  set  by  Piggott  while  being  significantly  more  efficient 
than  its  predecessor,  M0402.  The  latter  is  also  discussed  for  comparison 
purposes  and  because  M21UNC  contains  subroutines  written  at  NPL  thereby 
restricting  it  to  applications  within  RAE.  Both  programs  are  quite  general  and 
thus  should  be  of  interest  to  anyone  in  RAE  with  a problem  in  optimisation 
(including  nonlinear  least  squares  curve  fitting  and  even  the  solution  of  non- 
linear equations) . They  also  form  the  basis  of  two  programs  for  solving  non- 

...  2 
linearly  constrained  minimisation  problems  (Purcell  ) . 

It  is  convenient  to  incorporate  bounds  on  the  variables  into  an 
'unconstrained'  routine  since  they  are  a help  rather  than  a hindrance  in  that 
they  effectively  reduce  the  dimensionality  of  the  unconstrained  problem.  Thus 
in  what  follows,  when  referring  to  unconstrained  problems  we  include  the 
possibility  of  bounds  and  reserve  the  term  'constraints'  for  more  complicated 
restrictions. 

Many  copies  of  M0402  have  been  supplied  to  RAE  users.  Applications 
(mainly  least  squares  curve  fitting)  have  included  wind-tunnel  modelling, 
design  of  digital  filters,  helicopter  vibration,  satellite  launching,  modelling 
human  response  times  to  visual  data,  etc.  M21UNC,  being  a new  program,  has  not 
been  used  in  earnest  in  its  own  right  but  its  robustness  and  performance  have 
been  verified  on  constrained  optimisation  problems  originating  in  Aerodynamics 
Department. 

Both  M0402  and  M21UNC  employ  a quasi-Newton  algorithm.  This  has  been  the 
most  successful  approach  to  nonlinear  minimisation  but  every  aspect  of  the 
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original  algorithm  (Fletcher  and  Powell-*)  has  been  revised  and  improved.  The 
dual  purposes  of  this  Report  are  to  explain  how  Newton's  iteration  was  developed 
into  a practical  algorithm,  with  particular  emphasis  on  matrix  recurrence 
formulae,  and  to  introduce  a FORTRAN  computer  program  which  incorporates  all 
important  subsequent  enhancements. 

The  rest  of  this  Report  is  laid  out  as  follows.  We  begin  by  describing 
Newton's  iteration  for  function  minimisation  and  explain  what  is  meant  by 
'quasi-Newton'.  In  section  3 we  derive  formulae  for  updating  the  approximate 
inverse  Hessian  matrix,  to  H^+*^  where  i is  the  iteration  number  and 

in  so  doing  we  hope  that  the  underlying  principles  will  become  clear.  By 
contrast,  most  previous  authors  have  tended  to  state  their  particular  formula 
first  and  demonstrated  its  properties  afterwards.  We  shall  show  that  the  most 
important  ( ie  the  most  widely  used)  updates  are  also  the  simplest  which  fulfil 
the  fundamental  requirements.  The  way  in  which  these  specific  formulae  fit  into 
general  families  is  explained  and  equivalent  expressions  for  correcting 
approximations  to  the  Hessian  itself  (B  ^ to  B^+1^)  are  deduced  and  their 
advantages  outlined. 

Section  4 considers  the  difficulties  arising  when  partial  derivatives  have 
to  be  approximated  by  finite  differences  because  the  objective  function  F(x),, 
is  too  complicated  to  differentiate  explicitly.  The  ways  in  which  bounds  on  the 
variables  can  be  incorporated  are  discussed  in  section  5 while  section  6 explains 
the  one-dimensional  minimisation  (line  search)  procedure.  Section  7 summarises 
the  relative  merits  of  M0402  and  M21UNC  and  section  8 deals  with  sensitivity 
analysis.  There  we  describe  how  the  program  user  can  and  should  (unless  he  is 
only  interested  in  reducing  F to  some  'acceptable'  level)  explore  the  accuracy 
of  the  solution  found  and  its  sensitivity  to  small  changes  in  the  variables  and/ 
or  the  calculation  of  F . The  algorithms  implemented  as  M0402  and  M21UNC  are 
sketched  at  the  end  of  section  3.3  and  as  section  5.3  respectively  while  user 
guides  to  the  two  programs  appear  as  Appendices  B and  C. 

2 THE  QUASI -NEWTON  ITERATION 

This  Report  is  concerned  with  the  following  problem. 

Minimise  F(Xj , . . . ,x^, . . . . ,x^)  (2-1) 

subject  to  XL.  < x.  < XU,  k = 1,  2,  ...,  n 

. k K * , . 132 

where  F is  a nonlinear  function  of  the  . The  simplest  situation  would  be 

if  F was  a quadratic  form  and  the  method  we  shall  describe  is  based  on  quadratic 
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approximation  of  general  objective  functions.  Most  of  the  effort  is  therefore 
expended  on  directing  the  solution  path  into  a neighbourhood  of  a minimum  where 
F is  effectively  quadratic  so  that  rapid  final  convergence  is  achieved.  We 
must  emphasise  that  the  global  minimum  will  only  be  obtained  if  the  problem  is 
convex.  Furthermore,  the  algorithm  theoretically  requires  the  objective  function 
and  its  first  and  second  derivatives  to  be  continuous.  For  many  practical  prob- 
lems these  are  severe  conditions.  However,  the  method  can  often  be  successfully 
applied  under  weaker  conditions  or,  at  worst,  substantial  progress  made  towards 
a solution. 

The  method  consists  of  a sequence  of  steps  (iterations)  given  by 


x<i+1) 


x<i)  _ a<i>H(i)gU> 


(2-2) 


where  g^  is  the  vector  of  first  partial  derivatives  at  x^  . This  is  * 
merely  a combination  of  Newton's  iteration  for  locating  a zero  of  a function  and 
the  condition  for  a turning  point  (g  = 0)  extended  to  several  variables.  In  one 
variable 


+ (x 


(i+D 


0 . 


Therefore , 


(i+1) 

x 


dF/dx 

2 2 
d F/dx 


(2-3) 


Iteration  (2-3)  reaches  the  turning  point  in  one  step  if  F is  quadratic 

. ...  2 2 ... 
and  the  point  is  a minimum  if  d F/dx  > 0 . In  the  multi-dimensional  analogue 

(2-2),  is  the  inverse  of  the  second  derivative  (Hessian)  matrix  and  it 

must  be  positive  definite  at  a minimum,  ie 


u H*u  > 0 


for  any  non-zero  u 


(2-4) 


The  fundamental  idea  of  the  quasi-Newton  iteration  (Davidon  ) is  that  H should 
never  be  evaluated  explicitly  by  double  differentiation  and  matrix  inversion. 
Instead  H is  gradually  constructed  by  incorporating  information  acquired  at 
each  iteration  (specifically,  g - g and  x - x ) . In  principle 

then,  H is  updated  after  every  iteration  until  at  the  solution,  x*  , where 
g*  = 0 , H*  is  the  exact  inverse  of  the  true  Hessian,  G(x*)  . 
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An  alternative  interpretation  of  (2-2)  is  to  treat  (x^+^  - x^)  as  the 
direction  of  steepest  descent  at  x^  with  respect  to  the  metric  . Thus 

if  H = I , the  unit  matrix,  we  would  have  the  familiar  first  order  steepest 
descent  iteration  while  H = G 1 would  be  second  order  steepest  descent  (if  G 
is  positive  definite).  Because  H changes  from  one  iteration  to  the  next,  the 
method  was  also  termed  'variable  metric'. 

Now  it  is  well  known  that  Newton's  iteration  (2-3)  can  be  unstable  even  in 
one  dimension  so  the  parameter  a is  introduced  to  prevent  divergence.  The 
vector  Hg  is  now  regarded  as  a direction  with  a determining  how  far  along 
it  to  go  before  re-evaluating  g and  updating  H . It  is  therefore  expected 
that  as  a minimum  is  approached  F will  become  locally  quadratic,  H -►  G * and 
a ->  1 . It  should  already  be  apparent  that  the  efficiency  of  the  process  which 
determines  each  will  be  a major  factor  affecting  the  efficiency  of  the 

algorithm  as  a whole.  Likewise  the  procedure  for  improving  H will  have  to 
guard  against  ill-conditioning  of  the  resulting  matrix. 

In  the  next  section  we  consider  methods  by  which  H may  be  updated  with 
the  emphasis  initially  on  generating  the  (constant)  inverse  Hessian  of  a quadratic 
form  in  a finite  number  of  steps.  Simultaneously  we  will  discover  practical 
criteria  for  determining  a and  tests  to  protect  the  conditioning  of  H many  of 
which  are  immediately  applicable  to  more  general  objective  functions. 

3 MATRIX  UPDATING  FORMULAE 

As  implied  in  the  previous  section,  the  theory  behind  the  formulae  for 
updating  H is  only  strictly  applicable  to  quadratic  objective  functions.  It 
is  hoped  that  as  the  minimum  of  a non-quadratic  function  is  approached  this 
approximation  will  become  increasingly  accurate  and  that  ultimately,  quadratic 
convergence  will  be  achieved.  Thus  in  what  follows  we  shall  assume  a purely 
quadratic  F in  which  case  the  true  Hessian  matrix,  G , does  not  change  with 
x . The  problem  is  then  to  construct  G or  G * using  only  x and  g(x)  . 

(F(x)  is  not  needed  for  this  purpose.)  As  we  proceed  we  will  remind  ourselves 
periodically  of  our  ultimate  goal;  to  develop  an  algorithm  which  can  safely  and 
efficiently  be  applied  to  general  objective  functions. 

3. 1 Solving  quadratic  problems  without  matrix  updating 

Beginning  at  x^  where  the  partial  derivative  vector  is  g^  we  step 
to  other  points  x^  , ...x^,  ...  evaluating  g^ , ...g^,  ...  as  we  go. 
Because  F is  quadratic  we  have 
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Writing 


g(i+,) 


+ G(x 


(i+1) 


_g(i+,>  - g(i) 


(3-1) 


(3-2) 


this  becomes 


(3-3) 


It  is  then  intuitively  obvious  that  by  generating  n vectors  we  would 

be  able  to  determine  G from 

Q = GP  (3-4) 

where  the  columns  of  Q,P  are  the  vectors  . Of  course,  the  P^1^ 

must  be  linearly  independent  so  that  P can  be  inverted.  The  solution  to  our 
minimisation  problem  is  then  obtained  in  one  further  step,  given  by  (3-1), 


g(n+1) 


+ 


G(x 


(n+1) 


0 . 


Therefore 


x* 


= x(n+1) 


(n)  -1  (n) 

x “ G g 


(3-5) 


From  here  on  we  rely  on  the  reader  to  remember  that  x,  g,  p,  q are 
vectors.  New  vectors  ( eg  d,  y,  z)  will  be  underlined  when  they  first  appear 
but  not  subsequently. 

So  what  has  been  achieved  so  far?  We  have  not  mentioned  the  notion  of 
approximating  G (or  G *)  and  updating  it  after  each  iteration.  We  have  merely 
shown  that  knowledge  of  the  gradient  vector  at  (n  + I)  suitably  chosen  points 
is  sufficient  to  completely  determine  the  second  derivative  matrix  of  a quadratic 
function.  By  'suitably  chosen'  we  mean  that  the  vectors  (x^+1^  - x^)  must 
be  linearly  independent.  It  is  easy  enough  to  predetermine  a satisfactory  set  of 
points  and,  in  particular,  the  simplest  and  most  obvious  choice  is  to  make  P 
the  unit  matrix  by  stepping  along  the  coordinate  axes  in  turn,  ie 

0 k > i 

i = 0 , 1 . . .n  . 

1 k £ i 
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Hence  Q = G but  a matrix  inversion  is  still  necessary  in  order  to  make  the 
final  step  to  x*  . 

3.2  Rank-1  updating  and  positive  definiteness 

The  above  procedure  is  not  very  practical  when  the  objective  function  is 

non-quadratic  because  G varies  with  x . A more  general  strategy  for 

choosing  the  p^  is  needed  while  retaining  the  linear  independence  property 

4 

on  quadratic  F . The  important  contribution  made  by  Davidon  and  developed  by 

3 . . . 

Fletcher  and  Powell  was  to  recognise  that  the  Newton  iteration  (2-2)  could  be 

used  and  to  deduce  automatic  methods  for  generating  and  choosing  . 

In  particular,  Davidon  announced  a formula  by  which  could  be  updated  once 

(i)  (i+1 ) , (i+1)  , , , 

a , x and  g have  been  evaluated. 

Two  important  properties  of  the  Hessian  matrix  (or  inverse)  of  a general 

T 

function  are  symjnetry  and,  at  a minimum,  positive  definiteness  (£e  u H*u  > 0 
for  any  non-zero  vector  u ).  We  will  return  to  the  second  of  these  later. 
Meanwhile,  the  first  property  leads  us  to  investigate  the  simplest  possible 
symmetry  preserving  update,  the  addition  of  a symmetric  matrix  of  rank-1. 


H(i+1> 


„(i)  (i)  (i)  (i)T 

H + a z z 


(3-6) 


For  quadratic  F we  know  we  can  deduce  the  correct  Hessian  in  n steps. 

It  is  desirable  that  a strategy  which  amends  an  approximation  should  do  likewise, 

ie 


Now  if  we  can  find  n 
each  satisfying 


linearly  independent  vectors 


H(n)Gv(j) 


V 


(j) 


j = 0, 1 .... (n  - 1) 

(3-7) 


then  we  can  be  sure  that  H^n^  = G * because  only  the  unit  matrix  has  all  unit 
eigenvalues.  A natural  choice  for  v^  is  p ^ and  so  with  (3-3),  (3-7) 
becomes 


H<n>q(i>  , p(j> 


j = 0,1... (n  - 1)  . (3-8) 


Hence  if  we  make 


H(i+1) 


satisfy 


. P<i> 


j i 


(3-9) 
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then  = G * is  guaranteed  and  the  minimum,  x*  will  be  reached  on  the  next 

iteration.  Note  that  unlike  (3-5)  this  strategy  does  not  involve  any  matrix 
inversions . 


It  happens  that  imposing  requirement  (3-9)  for  j = i on  equation  (3-6) 
gives  a unique  formula  and  that  (3-9)  is  thereby  simultaneously  satisfied  for 
j < i . 


HWtl,q(i) 


p«>  . 


(3-10) 


It  can  easily  be  shown  that  the  only  symmetric  rank-1  update  is  characterised 
by  1 


,<*>  - p(i)  - H^q(i) 


(3-11) 


a 


(i) 


1 

(i)T  (i) 
q z 


and  (by  induction)  that  relations  (3-9)  automatically  hold  for  j < i (see 
Luenberger^,  pp  193-194).  Furthermore,  Murtagh  and  Sargent^  showed  that  the 
quasi-Newton  iteration  (p^  = - H^g^)  generates  directions  which  are 
linearly  independent  if  conditions  (3-9)  are  satisfied.  Thus  for  quadratic 
functions  we  have  a virtually  complete  algorithm,  (A-l),  as  follows: 


(1) 

Choose  x(0) , H(0) ( = I, 

say).  Evaluate  . Set  i = 0 

(2) 

Calculate  p^  . Hence 

x(i+,)  = x(i)  ♦ P(i>  . 

Note  that  p^^  to  p^n 

could  still  be  pre-determined 

while  p(°)  = - H(n)g(°) 

. 

(3) 

If  i = n , x(l+1)  = x* 

so  stop,  otherwise 

(4) 

Evaluate  g^+1^  . Hence 

(i) 

q 

(5) 

Update  H(l)  to  H(l+1) 

using  (3-6)  and  ( 3— 11). 

(6) 

Set  i = i + 1 ; go  to  (2) . 

This  algorithm  is  not  practical  even  on  quadratic  functions  because  the 
conditioning  of  ^ has  not  been  monitored.  We  mentioned  above  that  H* 

is  positive  definite  so  it  would  seem  a worthwhile  precaution  to  start  with 

positive  definite  and  to  maintain  so  subsequently.  A weakness  of 

the  rank-1  formula  is  thereby  exposed;  it  cannot  guarantee  positive 

definite  even  on  quadratic  functions.  However,  Murtagh  and  Sargent  noted  that 
G would  still  be  obtained  if  n updates  could  eventually  be  performed, 
ie  the  n linearly  independent  directions  need  not  be  generated  consecutively. 

-<i) 


An  obvious  sufficient  condition  for  H^1+'^  to  be  positive  definite  is 


> 0 in  (3-6),  ie  from  ( 3— 11) 


10 


q (p  - Hq)  > 0 . 


(3-12) 


Otherwise  we  set  H^+l^  = and  continue.  Unfortunately  there  is  no 

guarantee  that  inequality  (3-12)  will  be  satisfied  n times  even  if  F is 
quadratic  ( eg  Luenberger’’ , p 214).  A further  sufficient  condition  for  H^+1^ 
to  be  positive  definite  was  demonstrated  by  Murtagh  and  Sargent 


ag  z < 0 . 


(3-13) 


We  can  combine  these  conditions  so  that  H<‘  is  updated  if 


gU+1)T(p  " Hq)  > g^T(p  " Hq) 


g^T(p  - Hq)  > 0 . 


(3-14) 


To  summarise,  the  great  virtue  of  the  rank-1  formula  is  that,  on  quadratic 
functions,  n repetitions  of  iteration  (2-2)  with  a = 1 will  potentially 
completely  determine  the  Hessian  matrix.  The  over-riding  disadvantage  is  that 
H may  become  ill-conditioned  and  in  attempting  to  prevent  this  happening  we 
risk  the  algorithm  failing  to  achieve  H -*■  G * at  all.  We  are  not  dismayed 
by  losing  n-step  convergence  but  the  possibility  of  complete  failure  is  intoler- 
able. Of  course,  it  is  quite  likely  that  a path  to  the  solution  which  maintains 
H positive  definite  goes  a long  way  round  but,  in  addition  to  enabling  the 
conditioning  of  H to  be  controlled,  such  a route  has  a feature  which  is 
reassuring  on  general  functions,  namely  that  a decrease  in  F can  be  obtained 
at  each  step.  This  follows  from  the  Taylor  expansion, 

F(x  + p)  = F (x)  + pTg  + JpTGp  + ...  , 


p = - aHg 


(3-15) 


,a+o 


- agTHg  + ia2gTHGHg  + 


Therefore  F - F = - ag  Hg  + Jot  g HGHg  + ...  and  by  choosing  a 

sufficiently  small  and  positive  a decrease  in  F is  assured.  All  quasi-Newton 
algorithms  for  unconstrained  minimisation  strive  to  keep  H positive  definite. 
Likewise,  practical  algorithms  have  in  common  that  rank-2  updating  formula  are 
used  in  preference  to  rank- 1 , initially  because  a strategy  was  available  which 


ensured  that  H could  be  modified  in  safety  on  every  iteration.  We  now  turn  our 
attention  to  rank-2  modification  rules  and  confirm  their  various  properties. 

3.3  Rank-2  updates 

The  general  form  is 


,(i+0 


H(i)  + a(i)z(i)z(i)T  + b(i)y(i)/i)T 


(3-16) 


and  there  is  now  an  infinity  of  solutions  which  satisfy  the  conditions  (3-9)  so 
it  is  possible  to  specify  additional  desirable  features  and/or  to  consider 
families  of  rank-2  formulae.  In  particular,  if  we  could  select  a set  of  n 
directions  d^  which  were  mutually  conjugate  with  respect  to  the  Hessian,  G , 
of  a quadratic  function  then  not  only  would  = G * but  also  x^  = dx*  and 

an  iteration  would  have  been  saved.  From  here  on  d^  represents  direction  of 
search  while  p^  is  the  actual  step  taken,  ie 


(i)  (i+1)  (i)  _ (i)  (i) 

p =x  -x  = a d 


(3-17) 


The  conjugacy  condition  is  therefore 


d(l^TGd(^  = 0 


1 * i 


or 


p(i)Vj)  = 0 


j * i 


(3-18) 


With  (3-3) 


p(i>Tq(j>  = 0 


j * i 


(3-19) 


In  particular,  from  (3-15) 

p(i+i)Tq<i)  . . 0(i*i)ga*i)iH(ui)qo) 


- - aa+i>e<i*»Tp«> 


from  (3-9). 


Thus,  the  conjugacy  condition  becomes 


g 


(i+l)Td(i) 


0 


(3-20) 


which  is  purely  the  condition  that  the  function  F be  minimised  along  the 


12 


direction 

d^  . This  is  a key  observation  and 

suggests  the  following 

practical 

algorithm  (A— 2) 

(1) 

d(i)  * - H(i)g(i)  . 

( 3— 15a) 

(2) 

Find  a(l)  such  that  x(l)  + a(l)d(l) 

minimises  F along  the 

direction  d^^  . Hence  get  g^  + 

rt(i) 

. q 

(3) 

Update  H to  satisfy  (3-9) . 

(4) 

Set  i = i + 1 ; go  to  (1). 

The 

process  of  selecting  is  called  a 

'line  search'  and  the  term 

'exact  line  search'  denotes  that  (3-20)  is  satisfied. 


We  now  show  that  (3_9),  (3-15a)  and  (3~20)  together  guarantee  (3~19). 

The  proof  is  inductive.  First  we  observe  from  (3-18)  that  only  j < i need  be 
considered.  Therefore  we  have  to  prove  that 


d (i+ ! )Tq (j ) 


0 


j < i + 1 (3-21) 


assuming  that 


n(i)T  (j) 

p q 


d(i)Tq(j) 


0 


for  j < i . 


We  have  already  noted  that  (3-20)  establishes  ( 3—2 1 ) for  j = i so  only  j < i 
remain  to  be  proved. 

Now,  using  (3-15a)  and  the  definition  of  q((3-2)) 
d(i  + D = - H(i+,)(g(i)  + q(i)) 

= - H(l+1)g(l)  - p(l)  from  (3-9) 

'Lq 

d<i*»Tq(j)  . . g(i)TH(i*l>q(j)  . pa>Tq(j>  _ 

The  second  term  is  zero  by  assumption,  while  from  (3~9) 

g<i)TH<i+,)q<i)  = g«)Vj)  = g(i>TH(i>q«>  j < i 

d(i+l)Tq(j)  - d(l)Tq(j)  from  (3-15a) 

=0  by  assumption  when  j < i . 


13 


( 1 )T  (0) 

To  complete  the  proof  we  observe  that  p q = 0 as  a consequence  of  the 
first  exact  line  search. 

Thus  with  a quadratic  objective  function  algorithm  (A-2)  will  generate  a 
sequence  of  vectors  which  are  mutually  conjugate  with  respect  to  the 

true  Hessian  (rather  than  just  being  linearly  independent)  and  the  solution  x* 
will  be  obtained  in,  at  most,  n steps  providing  that  the  approximation  H to 
G 1 is  updated  after  each  iteration.  This  becomes  quite  clear  when  we  use 
( 3— 1 5a)  to  express  the  conjugacy  relations  as 
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and  by  (3-9) 


g(i)TH(i)q(j) 


g(i)Vj)  = 0 


J < 1 


J < 1 


(3-22) 


/^\ 

Thus  each  g is  orthogonal  to  all  the  previous  steps.  After  n linearly 

• . . • • (0)  (n-1)  , (n) 

independent  iterations  p , ....p  we  must  have  g = 0 . 

As  yet,  we  have  not  implied  any  particular  method  of  updating  except 

to  insist  that  relations  (3-9)  are  satisfied.  Hence  even  the  rank-1  formula 
could  be  used  although  there  is  no  virtue  in  so  doing  because  (3-20)  is 
insufficient  to  guarantee  that  the  matrix  update  can  proceed.  However,  we 
shall  show  presently  that  (3-20)  is  sufficient  to  ensure  that  H^+'^  is  posi- 
tive definite  when  a formula  of  rank-2  is  employed  even  on  non-quadratic 
functions . 

First  let  us  derive  a simple  rank-2  update;  taking  j < i , (3-16)  and 
(3-9)  give 

(i-M)  (j)  _ (i)  (j)  (i)  (i)  (i)T  (j)  (i)  (i)  (i)T  (j) 

H q -Hq  + a z z q +byy  q 

_ „<j> 


Therefore 


i(i)z(i)  (i)T  (j)+  b(i)  (i)  (i)T  (j)  „ Q . 


Now  (3-19)  shows  immediately  that  z^  = p^  would  make  one  term  zero  while 
from  (3-9)  we  can  get 


(i)TH(i).(j)  . <i)T  (j) 


0 


J < i 


,(i) 


H^q^  would  conveniently  zeroise  the  other  term. 


suggesting  that  y 
Thus  (3-16)  becomes,  dropping  the  superscript  for  clarity, 
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(i+1)  T T 

HV  ' = a + app  + bHqq  H . 

a and  b are  determined  by  considering  j = i in  (3-9) . 

(i+1)  T T 

H q = Hq  + app  q + bHqq  Hq  = p 


Therefore,  a = ; b 1 — . (3-23) 

P q q Hq 

4 

These  simple  formulae  for  a,  b,  y,  z do  indeed  correspond  to  Davidon's  original 
proposals.  Note  that  p and  Hq  have  to  be  linearly  independent  for  the 
update  to  be  of  rank-2.  Note  also  that  conditions  (3- 19)  and  (3_9)  are  now 
interlinked,  with  (3-20)  providing  the  continuity.  Thus  if  the  line  search  is 
not  exact  then  the  next  search  direction  will  not  be  conjugate,  the  following 
update  will  not  satisfy  (3-9)  and  finite  convergence  is  theoretically  lost  even 
on  quadratic  functions.  However,  an  exact  line  search  by  computer  is  impossible 
anyway  and  is  intuitively  undesirable  on  non-quadratic  functions  where  conjugacy 
and  n-step  convergence  are  irrelevant  except  perhaps  very  near  the  solution. 

Of  more  importance  is  that  a criterion  can  be  deduced  for  maintaining  H^+'^ 
positive  definite  and  that  the  exact  line  search  strategy  will  cause  this 
criterion  to  be  automatically  satisfied.  Hence  we  can  use  the  descent  property 
of  positive  definiteness  with  inexact  searches  to  manoeuvre  into  the  (quadratic) 
vicinity  of  a minimum  after  which  exact  searches  would  guarantee  convergence  in 
0(n)  more  iterations.  We  now  derive  the  condition  under  which  Davidon's  rule 
will  give  H^+l^  positive  definite  if  H^  is  • 


The  formula  is 

= h + E^-J&LH 
p q q Hq 


(3-24) 


We  need  to  show  that 


T (i+1) 
u H u 


Rhs 


> 0 for  arbitrary  u . Equation  (3-24)  gives 

JHu  ♦ hIei eIe  - uZnaalHu  # 

p q q Hq 


. . . . . . T 

Since  H is  positive  definite  we  can  define  a matrix  R such  that  R R = H . 

Writing  6 = Ru  ; e = Rq  gives 
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T T T T 

Rhs  = 6*6  + ^AP.  u)  . (6  el(e  6) 

P q e e 


, T y ,.T.W  T . ,.T  .2 

= (u  P + (6  6)(e  e)  ~ (6_e) 

T T 

p q t t 

The  second  term  is  positive  by  the  Cauchy-Schwartz  inequality.  Hence  a 
sufficient  condition  for  H^+*^  to  be  positive  definite  is 


T 

p q > 0 


(3-25) 


But 


T (i)T . (i+1)  (i). 

P q = P (g  “ g ) 


With  exact  searches  the  first  term  is  zero  by  definition  while  (3-15)  gives 

T T T 

pq  = _Pg  = ctgHg>0 

because  a > 0 and  H is  positive  definite. 

Thus  an  exact  line  search  will  guarantee  that  Davidon's  rule  gives  H^1+'^ 
positive  definite  while  (3~25)  is  the  test  which  must  be  applied  when  the  search 
is  inexact.  Note  that  condition  (3“12)  for  the  rank-1  formula  is  rather  more 
demanding.  However,  the  second  test  of  ( 3— 14)  can  be  rewritten 


ie 


gT(p  + Hg)  > gTHg 


(i+1) 


(1  - a)g  Hg  > - g 


If  an  exact  search  is  performed  then  the  condition  is  fulfilled  if  a < 1 . Of 

course  there  is  no  point  in  employing  exact  searches  with  the  rank-1  formula 

anyway  because  its  virtue  lies  in  its  potential  to  satisfy  (3-9)  by  taking 

steps  of  arbitrary  size  when  F is  quadratic.  On  the  other  hand  for  quadratic 

T 

problems  (3-3)  shows  that  p q > 0 for  any  a # 0 so  the  rank-2  update  will 
succeed  following  steps  of  arbitrary  size  but  convergence  in  n iterations  is 
no  longer  assured. 
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This  completes  the  basic  theory  of  unconstrained  optimisation  algorithms. 

In  particular,  we  have  seen  how  Davidon’s  rank-2  formula  arises  naturally  from 

the  desire  to  generate  a set  of  vectors  mutually  conjugate  with  respect  to  the 

unknown  Hessian  matrix,  G of  a quadratic  function  of  n variables  and  how  G 

can  thereby  be  deduced  in  n iterations  by  varying  the  approximating  metric, 

H . The  exact  line  search  is  a consequence  of  this  aim  and  is  sufficient  to 

ensure  that  H remains  positive  definite  and  hence  that  the  objective  function 

will  decrease  at  every  step.  It  should  be  stressed  that  while  the  symmetric 

rank-1  formula  satisfying  (3-9)  is  unique,  Davidon's  rank-2  update  is  but  a 

simple  example  from  an  infinity  of  possibilities.  In  practice,  line  searches 

are  never  carried  out  exactly  although  it  is  intended  that  they  will  become  more 

precise  as  a minimum  is  approached.  Thus  it  becomes  important  to  study  the 

behaviour  of  rank-2  formulae  when  slack  searches  are  used.  Such  work  has  led 

7 8 9 

to  the  general  adoption  of  the  BFS  rule  (Broyden  , Fletcher  , Shanno  independ- 
ently (]  970),)  which  looks  rather  more  complicated  than  (3-24). 


H(i+1) 


H + (P  ~ rHq) (p  ~ rHq)T 
rq  p 


HqqTH 

T T 

q Hq  + q p 


(3-26) 


where 


r = 


T 


T T 
q p + q Hq 


With  exact  searches  it  would  generate  the  same  sequence  of  points  as  (3-24)  and 
it  is  subject  to  the  same  positive  definite  criterion  ((3-25)). 

Finally  in  this  section  we  can  state  the  algorithm  (A-3)  on  which 
Mathematics  and  Computation  Department  FORTRAN  subroutine  M0402  is  based. 

(1)  Set  i = 0 ; choose  x^  and  (=  I normally)  and  evaluate  g^  . 

(2)  Set  d(l)  = - H(l)g(l^  . 

(3)  Perform  some  sort  of  line  search  to  get  p^\  x^+5^  . 

(4)  Evaluate  g^+1^  , q^  . Obtain  H^  from  (3-26)  if 

p(i)Tq(i)  > q . Otherwise  set  H^+1^  * . 

(5)  Terminate  if  the  line  search  fails  to  make  progress  (or  reset  H -*■  I 
and  try  to  continue)  or  if  the  convergence  criteria  are  satisfied. 
Otherwise  set  i = i+I  and  go  to  (2). 

Before  discussing  more  recent  algorithms  we  now  look  briefly  at  families 
of  rank-2  matrix  updating  formulae. 
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3.4  Families  of  rank-2  updates 

Davidon’s  formula  corresponds  to  the  specific  selections  y = Hq  ; z = p 
in  (3-16)  with  a and  b determined  from  conditions  (3-9)  with  j = i . Clearly 
y and  z could  each  be  any  linear  combination  of  p and  Hq 


y = <}>p  + Hq  ; z = p + i^Hq 


(3-27) 


and  conditions  ( 3—9)  would  automatically  be  satisfied  for  j < i (for  quadratic 
F and  with  exact  line  searches) . The  j = i case  can  be  identically  fulfilled 
by  (3-28)  instead  of  (3-16) 


Ha*‘>  - h*i£-5u£ 

z q y q 


(3-28) 


The  combination  of  (3-27)  and  (3-28)  gives  the  two  parameter  family  first  intro- 
duced by  Huang To  be  strictly  accurate,  Huang’s  family  has  three  parameters, 
the  third  one,  p , multiplying  the  second  term  in  (3-28).  We  have  ignored  it 
here  because  H^+*^q^  = p^  is  only  obtained  when  p = 1 . We  mention  it 
to  indicate  the  type  of  variation  which  has  been  considered  to  try  and  improve 
the  behaviour  of  algorithms  away  from  the  minimum  of  non-quadratic  functions. 

For  example  Biggs11  allowed  p to  vary  on  every  iteration  choosing  it  to  in 

. . .12 

some  sense  take  account  of  the  non-quadraticity  of  F . Dixon  showed  that  on 
general  functions  all  members  of  Huang's  family  having  the  same  p generated 

providing  the  line  search  located  the  first 
minimum  along  dVJ'/  exactly.  Therefore,  the  practical  choice  of  4>  and  ^ in 
(3-27)  will  depend  on  the  behaviour  when  slack  searches  are  employed. 


the  same  sequence  of  points  x 

,(i) 


Now,  it  will  be  observed  that  (3-28)  is  not  necessarily  a symmetric 
update.  If  we  impose  the  symmetry  requirement  then  the  family  reduces  to  one 
free  parameter.  Putting  (3-27)  in  (3-28),  we  find  that  the  terms  which  are  not 
automatically  symmetric  are 


«l>pqTH 


‘frHqp 


T T 

(p  + \|>Hq)  q (4>P  + Hq)  q 


The  necessary  condition  for  symmetry  is  then 


T T 

^(<J>P  + Hq)  q + <{) (p  + ipHq)  q = 0 , 
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* = :L*E-9 t-  • (3-29) 

(4>P  + (<(>  + DHq)  q 

This  resulting  one  parameter  family  of  formulae  (the  symmetrised  Huang  family) 

13 

is  equivalent  to  Broyden's  family  and  we  have 

= 0 ; \J>  = 0 Davidon’s  formula 


“ 1;  = - 1 


Rank-1  formula 


; = 


_P_3_ 


T T„ 

p q + q Hq 


BFS  formula  . 


(3-30) 


Alternatively,  when  we  combine  (3-27)  with  (3-16)  and  determine  a and  b 
from  H 1 *^q  1 = p ^ we  get  a two  parameter  family  of  symmetric  rank-2 


corrections : 


jj(i+1 ) _ H + (<|>  + l)zzT  _ (<|>  + l)yyT 

(l  - <M)zTq  (i  - <M)yTq 


(3-31) 


Broyden's  family  is  regained  by  setting  4>  = 0 , ie 


H(i+,)  = H + (P  + 0Hq)(p  ->■  ijiHq)1  (»  + l)HqqTH 

(P  + 'I'Hq ) Tq  qTHq 


(3-32) 


By  repeating  the  analysis  following  equation  (3-24)  it  is  easily  shown  that 


H^+1^  is  necessarily  positive  definite  if 


is,  providing 


pTq  > 0 ; < It  ( 0 . (3-33a,b) 

q Hq 

Thus  Davidon's  formula  is  at  one  end  of  the  safe  range  while  BFS  is  placed 
intermediately  and  the  rank-1  update  is  indefinite  (see  (3-30)).  Looked  at  in 
this  way  it  might  perhaps  be  suspected  that  BFS  would  be  better  behaved  than 
Davidon. 

9 

Shanno  obtained  the  one  parameter  family  from  ( 3—  31)  by  setting  i|i  = 0 ; 

♦ + 1 ■ t . He  showed  that  the  smallest  eigenvalue  of  is  maximised  by 

the  choice  t = » and  that  the  spectral  condition  number  of  is  then 

also  favourable  (Shanno  and  Kettler  ).  In  the  latter  paper  it  was  further 
argued  that  t ■ « preserves  certain  eigenvalue  properties  during  updating  so 
that  in  effect  the  BFS  correction  could  be  deduced  directly  by  recognising  (3“34) 
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as  a desirable  property  of  and  imposing  it  on  the  family  (3-32)  assuming 

exact  line  searches  (ie  (3-20)): 


(i+ 1 ) T (i+ 1 ) (i+1) 

g H g 


( i+ 1 ) T (i)  (i+l) 
g H g 


(3-34) 


Of  course,  the  theory  is  again  only  applicable  to  quadratic  F . 

The  historical  development  of  matrix  modification  schemes  should  by  now  be 

apparent.  Davidon's  formula  (usually  referred  to  as  DFP  to  acknowledge  its 

3 

practical  implementation  by  Fletcher  and  Powell  )was  used  for  many  years  until 
it  was  realised  that  whole  families  of  updating  formulae  existed.  Meanwhile  as 
the  exact  line  search  policy  was  relaxed,  DFP  tended  to  run  into  difficulties 
which  were  attributed  to  ill-conditioning  of  H . Although  no  really  convincing 
theoretical  results  concerning  inexact  searches  on  non-quadratic  functions  have 
appeared,  practical  experience  has  confirmed  that  BFS  is  a superior  tool.  In 
the  last  five  years  the  algorithm  (A-3)  has  been  revised  to  make  it  numerically 
more  stable  and  more  efficient  in  terms  of  function  evaluations.  The  manner  in 
which  matrices  are  now  recurred  is  briefly  described  in  the  next  section. 


3.5  More  recent  developments 


It  has  been  recommended,  initially  by  Gill  and  Murray that  the  Hessian 
itself  be  approximated  rather  than  its  inverse.  It  is  interesting  to  follow 
through  the  developments  of  sections  3.2  and  3.3  in  the  revised  form.  Let 
be  the  approximation  to  G . The  search  direction  is  then  obtained  as  the  solu- 
tion of 


B(i)d<i)  - - g(i)  . 


(3-35) 


. . . T 

In  practice  B is  stored  in  LDL  form  and  the  unit  lower  triangular 
matrix  L and  diagonal  matrix  D are  updated  separately.  Any  ill-conditioning 
of  B tends  to  be  reflected  in  D which  can  be  rescaled  when  necessary  to 
reduce  its  condition  number.  Any  sparsity  in  B will  show  up  in  L whereas  it 
would  not  in  H but  this  is  a feature  which  will  not  concern  us;  suffice  it 
to  say  that  the  revised  formulation  is  more  adaptable  to  structured  problems. 

It  is  also  a relatively  simple  matter  to  ensure  that  B^+l^  is  positive  definite 
whereas  test  (3-25)  does  not  guard  against  rounding  error  and  there  is  no  way  of 
knowing  what  effect  this  has  on  . 


32 


T T 

u LDL  u > 0 


for  arbitrary  u(^  0)  . 
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T 

Defining  v = L u it  is  obvious  that  if  D is  positive  definite  (te  consists 
of  all  positive  elements)  then  B will  be  also.  However,  it  must  be  remembered 
that  the  conditioning  of  D is  only  a guide  to  that  of  B inasmuch  as 

n n 

TK-TT  ■ det<B)  • 

k= 1 k=  1 

It  is  therefore  quite  easy  to  construct  examples  where  all  the  are 

of  similar  magnitude  but  the  eigenvalues  of  B , Ag^  , differ  vastly. 

Formulae  complementary  to  H-updates  can  be  obtained  by  making  the 
transformation 

H B ; p -*•  q ; q-+p. 

The  unique  rank-1  correction  ( (3—6)  with  (3-11))  is  then 

B(i+0  _ B + (q  ~ Bp)  (q  - Bp) T 

pT(q  - Bp) 

for  which,  of  course,  the  property  that  if  H = B ' then 
holds.  This  is  not  true  of  complementary  rank-2  updates, 
of  Hessian  updating  formulae  is  that  matrix-vector  products  can  be  eliminated 
because 

Hq  -*■  Bp  = - ag 

and  the  computation  is  more  accurate  as  a result.  An  interesting  feature  of 
(3~37)  is  that  the  obvious  positive  definite  preserving  condition  is 

pT(q  - Bp)  > 0 (3-38) 

which  is  not  the  same  as  the  denominator  condition  for  H^+'^  ( 3— 12)  but  does 
correspond  with  (3-14b)  which  was  effectively  obtained  by  Murtagh  and  Sargent^ 
but  rather  more  laboriously 

pT(q  - Bp)  = a(dTq  + pTg) 

T T 

= cx(-  g Hq  + p g) 

T 

= ag  (p  - Hq)  . 


(3-36) 

(3-37) 

H(i+1)  = B(i+,) 

An  important  feature 
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Condition  (3-12)  must  also  be  usable  when  B is  recurred  but  it  is  less 
convenient  to  apply  as  (B^)  *g^+*^  has  to  be  specially  computed. 


An  even  more  interesting  revelation  occurs  when  we  apply  (3-36)  to  the 
DFP  rule  (3-24).  With  (3-35)  we  find 


B(i+1) 


B + 


T (i)  (i)T 

+ 1 g 

T T (i) 

P q d g 


(3-39) 


Now,  if  we  apply  Householder's*^  modification  formula  twice  to  (3- 39)  we  will 
obtain  the  equivalent  H-update.  The  result  is  (3-26),  the  BFS  update,  and 
not  (3-24).  Repeating  briefly  the  analysis  of  section  3.3,  the  generalised 
rank-2  update  is 


B(i+1) 


,(i) 


♦ c(i)v(i)v(i)T  ♦ e(i)w(i>w(i)T 


(3-40) 


and  conditions  (3-9)  become 


R(i+I)n(j) 

B p 


j < i . (3-41) 


Combining  (3-40)  and  (3-41)  gives 


cvv 


Vj)  + ewwTp(j)  - 0 


J < t 


The  obvious  vector  choices  (from  (3- 19)  and  (3-22))  are  now 


q J w = g 


(i) 


(3-42) 


while  the  j = i case  gives 
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p q 


d 


„T  (D 

a g 


Thus,  if  the  analysis  had  been  carried  out  in  the  first  place  in  a B-updating 
framework,  the  BFS  formula  would  have  been  obtained  as  the  simplest  rank-2 
update  which  generates  conjugate  directions  with  exact  line  searches  and  con- 
structs the  true  Hessian  of  a quadratic  function  within  n iterations.  It  there- 
fore seems  a rather  curious  coincidence  that  researchers  apparently  arrived  at 
the  BFS  formula  for  updating  H by  employing  fairly  sophisticated  arguments 
(as  indicated  in  section  3.4).  Why  should  the  'best'  formula  so  derived  happen 
to  be  the  most  elementary  solution  of  the  complementary  problem?  It  is  all  the 
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of  a normal  BFS  algorithm  has  proved  extremely  satisfactory  suggesting  that  the 
difficulties  alluded  to  have  been  overcome  quite  effectively  by  other  means. 

Before  leaving  the  subject  of  matrix  modification  schemes  we  should  remark 

that  our  derivation  of  the  DFP  update  does  not  coincide  with  Davidon's^  line 

of  thought.  In  fact,  he  was  seemingly  unaware  of  its  conjugate  direction  property 

and  proposed  a more  complicated  algorithm.  The  initial  extension,  clarification 

3 

and  simplification  of  his  basic  ideas  was  the  work  of  Fletcher  and  Powell  . Any 
student  of  innovative  minds  is  therefore  referred  to  the  original  paper! 

4 FINITE  DIFFERENCE  DERIVATIVES 

4. 1 Motivation  and  efficient  implementation 

In  section  3 it  was  largely  assumed  that  exact  computations  could  be 
performed.  In  particular,  we  never  doubted  that  the  partial  derivative  vector 
g could  be  determined  precisely.  Unfortunately,  it  is  frequently  the  case  that 
analytic  derivatives  of  F are  not  available  in  which  case  finite  difference 
approximations  are  normally  the  best  substitute  although  occasionally  the  nature 
of  the  problem  is  such  that  a method  which  does  not  involve  gradient  calcula- 
tions has  to  be  employed.  Our  computer  programs  M0402  and  M21UNC  assume  that 
finite  differences  can  be  used  and  there  is  a version  of  each  for  use  when 
explicit  expressions  for  g can  be  supplied.  Either  way,  it  is  assumed  that 
the  work  involved  in  evaluating  g greatly  exceeds  that  required  to  get  F so 
that  g is  only  computed  at  the  end  of  a line  search  rather  than  every  time  F 
is  calculated.  Thus  quadratic  rather  than  cubic  interpolation  is  used  in  the 
one-dimensional  minimisation  algorithm. 

Now,  the  efficiency  of  an  optimisation  algorithm  is  judged  mainly  by  the 
number  of  function  evaluations  (NFE)  required  to  solve  a particular  problem. 

With  finite  differences  this  means  that  the  number  of  gradient  calls  has  to  be 
kept  down  and  that  forward  differences  (4-1)  which  involve  n further  function 
evaluations  are  to  be  preferred  to  central  differences  (4-2)  which  involve  2n 
although  the  latter  are  more  accurate  and  will  be  essential  near  the  solution. 


F (x  + Ax)  - F (x) 

Ax  32F 

Ax 

2 a 2 

L OX  J 

F (x  + Ax)  - F (x  ~ Ax)  _ Ax^  33F 

^ * 2Ax  6 3 

L 3x 


(4-1) 
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Thus  for  quadratic  F central  differences  are  exact  except  for  rounding 
error.  The  reader  should  now  begin  to  appreciate  two  dilemmas  facing  us: 

(a)  Exact  line  searches  waste  function  evaluations  but  tend  to  reduce  the 
number  of  iterations  (te  gradient  evaluations)  needed  to  solve  a 
particular  problem. 

(b)  Forward  differences  require  only  half  the  function  calls  of  central 
differences  but,  being  less  accurate,  tend  to  increase  the  number  of 
iterations  so  that  there  is  usually  still  a saving  but  less  than  a 
factor  of  two. 

M0402  therefore  has  the  fixed  strategy  of  central  differences  and  a fairly 
tight  line  search  throughout.  M21UNC  uses  forward  differences  wherever  possible 
and  allows  the  user  to  specify  the  accuracy  of  the  search  procedure.  Unfortu- 
nately, there  is  no  way  of  automatically  deciding  how  accurate  each  search 
should  be  when  x is  remote  from  the  solution  and  it  becomes  a matter  for 
experience  to  decide  whether  an  overall  slack  or  tight  search  policy  is  prefer- 
able for  a given  problem.  For  most  applications  such  optimal  program  efficiency 
will  probably  not  be  vital  but  the  opportunity  to  experiment  if  desired  is  at 
least  available  in  M21UNC. 


Before  considering  the  use  of  finite  differences  in  more  detail,  we  ought 
to  mention  the  alternatives.  In  the  mid-1960s  the  intuitive  feeling  that  perform- 
ing many  function  evaluations  in  such  close  proximity  was  inefficient  contributed 
to  the  preferential  development  of  other  algorithms.  In  particular,  Powell'^ 
found  a way  of  generating  conjugate  directions  without  using  derivatives  but 

the  method  was  very  dependent  on  accurate  line  searches  throughout.  A sophisti- 

1 8 

cated  simplex  method  was  developed  by  Nelder  and  Mead  but,  like  any  so-called 
direct  search  method,  it  converged  slowly  and  often  failed  completely.  A 
deficiency  common  to  both  these  approaches  was  their  deterioration  as  n 
increased  and  it  is  now  accepted  that  finite  difference  derivatives  represent 
the  best  investment  providing  that 

(i)  the  objective  function  is  sufficiently  smooth; 


(ii)  the  calculation  of  F is  not  subject  to  gross  errors  (causing 
1 F(x)  1 
l F(x  - Ax)/ 


F (x  + Ax) 


to  be  highly  inaccurate) , 


(iii)  the  step  size  (differencing  interval) , Ax  is  suitably  chosen. 

The  first  successful  finite  difference  implementation  was  by  Stewart 

15 


19 


although  it  was  later  pointed  out  by  Gill  and  Murray  that  Ax  should  remain 
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fixed  throughout  the  minimisation  whereas  Stewart  had  allowed  it  to  vary  with  x 
Thus  in  M0402  and  M21UNC  the  Ax  specified  by  the  user  is  adopted  throughout 
regardless  of  whether  (4-1)  or  (4~2)  is  in  use.  Like  the  line  search  accuracy 
parameter  referred  to  above,  this  involves  a compromise  and  the  opportunity 
(more  important  with  Ax  ) for  experimentation.  There  seems  to  be  a case  for 
further  research  here  because  the  argument  for  constant  Ax  is  related  to  the 
accuracy  of  q (equation  (3~2)).  One  would  imagine  that  not  updating  H (or 
B)  on  an  iteration  when  Ax  is  changed  would  be  an  adequate  precuation.  Indeed 
one  might  also  be  tempted  to  consider  such  a safeguard  when  switching  between 
(4-1)  and  (4-2). 


In  the  next  two  subsections  the  choice  of  Ax  and  criteria  governing  this 
switching  are  discussed. 

4.2  Choosing  the  differencing  interval 

The  essential  observation  is  that  the  influence  of  inaccuracy,  e , due  to 
rounding  error  in  the  calculation  of  F increases  as  Ax  decreases  while  the 
truncation  error  of  the  Taylor  expansion  increases  with  increasing  Ax  . Select 
ing  Ax  therefore  involves  balancing  these  opposing  effects.  Note  that 


where  e is  the  precision  of  computer  arithmetic  (e  — 10 


e > e _ 

m m 

ICL1900  series). 
From  (4-1) 


From  (4-2) 
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Ec  6.3 
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(4-3) 


(4-4) 


Choosing  Ax  to  minimise  E = |er|  + |et|  gives 

AXf  “ tfoj  ; 4x'  “ 


(4-5) 


and  hence  the  likely  error  in  g when  Ax  is  chosen  in  this  way  is 


Ef  - 4e/Axf 


3e/2Ax 


(4-6) 


Relations  (4-5)  and  (4-6)  respectively  suggest  that 
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For  example,  taking  e 


Ax  > Ax,  ; E > E 
c f f c 


. -9  32F  33F  , 

10  ; — j ~ = 1 gives 

3x  3x 


6.3  x 10 


1.4  x 10 


- 1.0  x 10 


Although  E^  ^ E^  the  difference  will  be  unimportant  while  g ^ E^  . The 

example  emphasises  the  need  for  central  differences  near  the  minimum  and  gives 

. . . . T 

some  insight  into  how  precisely  we  may  expect  to  locate  it.  Clearly  g g = 0 

will  not  be  reached.  Indeed  we  would  be  lucky  to  progress  beyond 


ki 

* ^(e/Ax.) 


(4-7) 


. . . 3 3 

and  even  this  might  become  optimistic  if  3 F/3x  is  especially  large. 

3 3 

Unfortunately,  we  cannot  estimate  3 F/3x  but  a finite  difference  approximation 
2 2 

to  3 F/3x  is  available  at  no  extra  cost  when  3F/3x  is  computed  by  central 
differences 

3^F  _ (F(x  + Ax)  - 2F(x)  + F(x  - Ax)  Ax^  3^F  A ,tn\ 

= ...  . (4_8) 

3x  (Ax)  U 3x 

2 

Note  that  the  rounding  error  is  here  ^ e/(Ax)  but  hopefully  the  relative  error 

2 2 

will  still  be  small.  M21UNC  outputs  3 F/3x  as  obtained  from  (4~8)  together 
with  the  diagonal  of  B(x*) . The  two  n-vectors  should  be  comparable  for  the 
free  variables. 


It  also  becomes  practical  to  compute  g by  central  differences  and  to 
set  up  the  diagonal  elements  of  or  from  (4-8) , providing  they  are 

positive,  thereby  ensuring  that  B or  H at  least  starts  off  reasonably  well 
scaled.  At  the  same  time  one  could  check  via  (4-5)  that  Ax  has  been  reasonably 
chosen  bearing  in  mind 

(i)  the  cancellation  error  associated  with  (4-8)  and 

(ii)  that  the  ideal  Ax  for  central  differences  (Axc)  is  probably  con- 
siderably larger  than  Ax^  . 

Furthermore  (as  suggested  in  (4-7))  Ax  might  need  to  be  different  for 
different  variables. 
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M0402  ignores  these  possibilities,  merely  setting  = I and  applying 

the  single  Ax  specified  by  the  user  to  all  variables.  M2IUNC  can  establish  the 
elements  of  from  (4-8)  but  like  M0402  it  adopts  the  same  user-specified 

Ax  throughout  to  generate  every  element  of  g , the  rationale  being  that 

2 2 (0)  , ... 

(3  F/3x  ) may  not  be  representative  of  later  second  derivative  behaviour 

and  that  the  user  has  chosen  Ax  accordingly.  However,  Ax  could  be  reviewed 

periodically  {eg  by  inserting  a central  difference  derivative  evaluation  every 

few  iterations).  It  might  be  interesting  to  implement  some  of  these  more 

sophisticated  procedures  for  Ax  to  see  whether  appreciable  improvement  over 

the  present  strategy  results.  Meanwhile,  it  is  gratifying  to  observe  that  the 

simplified  process  solves  problems  quite  well  although  the  efficiency  can  vary 

considerably  with  Ax  especially,  as  one  might  expect,  when  e is  large. 

Thus  if  many  similar  problems  are  to  be  tackled,  the  user  is  recommended 

to  rerun  M0402  or  M21UNC  with  a variety  of  Ax  to  determine  the  best  value. 

Effectively,  he  will  be  discovering  e which  can  be  a difficult  quantity  to 

estimate  a priori.  Whereas  the  machine  accuracy  is  clearly  the  lower  limit  to 

e a much  higher  value  might  be  more  realistic  depending  on  the  nature  and 

complexity  of  the  process  by  which  F is  calculated.  Equations  (4~5)  suggest 

-3  -4  -5 

that  on  the  ICL1906S  suitable  trial  values  might  be  Ax  = 10  ,10  ,10 

It  is  probably  not  worth  trying  to  be  more  precise. 

The  message  of  this  subsection  should  by  now  be  fairly  clear  - that 
analytic  derivatives  should  be  used  wherever  possible  but  that  numerical  approxi- 
mations will  usually  be  an  adequate  substitute  providing  the  differencing 
interval  is  chosen  sensibly  and  high  accuracy  in  the  final  solution  is  not 
demanded.  To  this  we  would  add  the  general  advice  that  when  analytic  deriva- 
tives are  programmed  they  should  be  checked  by  dif ferer-cing  before  attempting 
full  production. 

4 . 3 Single  or  central  differences? 

It  was  concluded  in  section  4.2  that  single  {ie  forward  or  backward) 
differences  are  more  economical  overall  than  central  differences  but  that  some 
means  is  necessary  of  sensing  when  the  greater  accuracy  of  the  latter  becomes 
vital.  In  practice  two  tolerances  a]»a2  » on  li-ne  search  parameter  a are 
132  prescribed.  Letting  > ct^  > 0 we  switch  to  central  differences  when  no 

lower  point  having  a > can  be  found  after  which  a line  search  is  deemed  to 
fail  if  no  acceptable  a £ is  located.  Conversely,  if  when  using  central 
differences  a step  is  made  such  that  a * , we  switch  to  (4-1)  subsequently. 
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Plainly  then  the  idea  is  to  use  (4-2)  when  little  progress  is  otherwise  being 
made,  indicating  either  that  the  minimum  is  being  approached  or  that  the  solution 
path  has  entered  an  irregular  region. 


Thus,  in  reality  the  tolerances  are  based  on 
From  (3-17) 


.(i+1) 


(i+1) 


“I  I dl 


It  is  also  desirable  to  take  into  account  the  differencing  interval  Ax 
because  this  includes  a dependence  on  the  function  accuracy  e . The  criteria 
therefore  become: 


(i)  switch  between  (4-1)  and  (4-2)  at  Oj||d||  = F^Ax)  ; 

(ii)  deem.the  line  search  to  have  failed  if  a||d||  > F^(Ax) 
produce  a sufficient  decrease  in  F . 

i 

Suitable  settings  are  Fj  = (Ax)  ; F£  = 0.1  Ax  , 

i 

(Ax)2 3  _ 0.1  Ax 

a,  - mr;a2  = wrr- 


does  not 


(4-9) 


In  addition,  it  is  possible  for  the  error  of  the  forward  difference 
approximation  to  become  large  relative  to  g so  strictly  speaking  it  should 
also  be  tested,  eg  switch  to  central  differences  if 


max 

k=  1 . . .n 
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> 0.1 


(4-10) 


2 2 

Now  3 F/3x  cannot  be  reliably  estimated  from  B except  perhaps  near 
x*  while  a difference  approximation  has  its  own  sources  of  error  and  involves 
n extra  function  calls.  Nor  can  we  reasonably  assume  that  the  two  numerator 
terms  are  of  comparable  size  so  in  practice  test  (4-10)  is  not  made  by  M21UNC. 
However,  it  does  provide  another  reason  (other  than  to  check  Ax  ; see 
section  4.2)  for  occasionally  interjecting  a central  difference  iteration. 


An  important  matter  arising  from  (4-9)  is  the  accuracy  of  the  final 
solution.  From  the  expression  for  02  it  is  apparent  that  better  than  O.lAx 
should  not  be  expected  because  the  line  search  will  not  evaluate  F at  points 
any  closer  together.  M21UNC  actually  replaces  Ax  in  the  definition  of  o2  by 


132 


29 


rain  (Ax, TOLX)  where  TOLX  is  the  user's  accuracy  request.  However,  one  would 
expect  that  having  chosen  Ax  thoughtfully  the  user  would  realise  that  setting 
TOLX  Ax  was  either  unnecessary  or  unduly  optimistic. 

At  the  start  of  this  subsection  it  was  implied  that  backward  differences 
are  sometimes  used.  Although  the  subject  of  bounds  on  variables  will  be  more 
fully  treated  in  section  5 it  is  appropriate  to  remark  here  that  central  differ- 
ence derivatives  are  always  attempted  for  variables  on  bounds.  However,  it  is 
possible  that  the  overstepping  of  the  bound  thereby  involved  is  impermissible  in 
which  case  a forward  or  backward  difference  has  to  suffice.  Similar  considera- 
tions apply  to  variables  which  are  free  but  are  within  Ax  of  one  or  other  bound. 

4.4  Scaling 

Because  computer  arithmetic  has  a finite  accuracy  we  cannot  guarantee  the 

behaviour  of  algorithms  applied  to  unsealed  problems,  eg  where  one  variable  is 
“3  6 

say  ~ 10  while  another  is  ~ 10  . Clearly,  no  variable  should  dominate  by 

virtue  of  its  magnitude  and  both  M0402  and  M21UNC  require  the  user  to  supply  a 

vector  of  scale  factors  XS  such  that  |x^|/XS  ~ 1 . This  implies  that  the 

k . (0) 

x^  are  not  going  to  change  by  orders  of  magnitude  between  x and  x*  . If 

they  do  then  the  user  would  be  advised  to  transform  the  variables  used  in  the 

optimisation  ( eg  to  log  x^  ) . 

Of  course,  there  are  other  reasons  of  a more  mathematical  nature  for 
scaling  x (and  also  F ) . An  important  one  concerns  the  eigenvalue  structure 
of  the  Hessian  approximation,  B . Numerical  stability  and  algorithm  efficiency 
are  assisted  by  B having  a small  spectral  condition  number  with  the  eigen- 
values spread  evenly  above  and  below  unity. 

Luenberger's  self-scaling  Hessian  updating  formula,  mentioned  in  section  3.5 
was  intended  to  do  this  automatically  but  since  it  can  be  accomplished  by  proper 
scaling,  which  is  anyway  desirable  on  other  grounds,  there  seems  little  to  be 
gained  especially  as  the  B ->  G property  is  lost. 

Note  how  changing  FS  , the  scale  factor  for  F , will  multiply  every 

eigenvalue  and  every  element  of  g simultaneously 
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(4-11) 


Clearly  the  g,  can  be  made  arbitrarily  large  or  small  by  varying  the  choice  of 

. . k T 

FS  in  particular,  so  any  convergence  criterion  based  on  g g assumes  sensible 

. T 

scaling.  M0402  does  not  actually  test  g g ; instead  it  continues  until  a line 
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search  fails  to  find  a sufficiently  different  lower  point.  It  then  tries  a 
steepest  descent  (d  = - g)  step  before  terminating.  Although  the  user 
is  called  upon  to  supply  a tolerance  (e)  its  effect  on  the  ultimate  accuracy 
of  x*  and  g*  is  rather  obscure  (see  section  6.1). 

M21UNC  does  monitor  ||g||  so  it  is  particularly  important  that  the  scale 
factors  are  well  chosen.  FS  does  not  appear  explicitly  in  M2 1UNC  (or  M0402)  - 
it  is  up  to  the  user  to  incorporate  a suitable  (constant)  value  in  his  subroutine 
CALCF.  Before  exit,  M21UNC  prints  a table  to  which  the  user  should  refer  for 

(i)  verification  of  the  solution; 

(ii)  sensitivity  analysis  (see  section  8); 


(iii)  possible  revision  of  XS  and  FS  . 


In  connexion  with  (iii),  the  diagonal  matrix  D is  output.  Those  elements 
corresponding  to  bound  variables  should  be  1.0  while  the  others  should  be  spread 
above  and  below  unity.  If  this  does  not  occur  then  changes  to  XS  and  FS 
(using  (4-11)  with  k = £)  might  be  contemplated  although  the  resulting  effects 
on  g and  x must  not  be  forgotten. 


With  M21UNC  the  desired  accuracy  of  g*  must  be  specified  and  the  program 
will  terminate  when 


and 


(i>  l|g|l2  " 

(ii)  I lB  'sl  L 


T A 

(g  gr  < tolg 

= maxl(B  *g)  | < TOLX 
k 1 k| 


(4-12) 


or  no  further  progress  can  be  made. 

It  may  be  surmised  from  sections  4.2  and  4.3  that  TOLG  ~ TOLX  ~ Ax  is  a 
suitable  arrangement  but  there  is  no  harm  in  specifying  TOLG  and  TOLX  down  to 
O(e^)  . Of  course,  much  greater  accuracy  can  be  anticipated  when  g is  expres- 
sed analytically.  Ultimately  though,  we  will  always  be  restricted  by  the 
accuracy  to  which  F can  be  calculated.  Examples  of  potential  difficulties  are 

(i)  a differential  equation  which  has  to  be  solved  numerically  (is  the 
solu.ion/numerical  method  stable?); 

(ii)  an  iterative  process  (the  termination  criteria  must  be  sufficiently 
stringent; ; 

(iii)  decisions  ( eg  if  x < I do  one  thing;  if  x >,  1 do  something  else. 
At  least  F and  9F/9x  must  be  continuous  at  x = 1). 
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If  the  user  doubts  whether  a differencing  process  can  generate  a meaningful 
estimate  of  9F/3x  then  he  is  going  to  find  it  difficult  to  solve  his  optimisa- 
tion problem  because  any  method  will  ultimately  want  to  make  small  changes  in 
x and  will  be  misled  if  F is  subject  to  gross  errors.  An  example  of  a 
virtually  impossible  problem  is  when  the  calculation  of  F involves  a Monte- 
Carlo  simulation.  , 

However,  once  a suitable  problem  is  properly  posed  there  should  be  little 
difficulty  in  successfully  implementing  M0402  or  M21UNC  especially  if  g can  be 
provided  analytically. 

5 BOUNDS  ON  THE  VARIABLES 

5 . I The  philosophies  of  M0402  and  M21UNC 

Many  practical  optimisation  problems  are  constrained  only  by  upper  and/or 
lower  bounds  on  the  variables 

XI^  $ Xj^  $ XUk  . (5-1) 

For  example  we  might  have  a set  of  non-linear  equations  with  several  solutions 
from  which  we  wish  to  isolate  one  in  particular.  This  may  be  achieved  by 
restricting  the  range  of  the  variables.  Following  on  from  this  to  the  over- 
determined system  or  nonlinear  least  squares  class  of  problem  where  we  might  be 
fitting  a model  to  a set  of  data  and  know  from  physical  considerations  that  some 
or  all  of  the  parameters  must  lie  in  a certain  range.  The  most  common  constraint 
is  x^  £ 0 . 

A similar  situation  occurs  when  a process  is  approximated  by  a formula 
which  is  only  valid  in  a certain  region  of  x-space.  Bounds  can  be  imposed  to 
prevent  the  solution  path  entering  the  undefined  region  and  possibly  terminating 
at  a spurious  minimum.  Both  M0402  and  M21UNC  require  bounds  to  be  specified. 

The  extra  labour  involved  in  checking  a ridiculous  XL  or  XU  (te  when  there  is 
really  no  bound)  is  quite  trivial.  The  two  programs  have  rather  different 
attitudes  towards  bounds.  M0402  adopts  the  simple  strategy  of  considering  every 
variable  all  of  the  time  while  M21UNC  neglects  variables  on  bounds  until  a mini- 
mum on  the  subspace  is  neared  when  the  bounded  set  is  searched  for  a variable  to 


release  which  will  decrease  F 

, ie 

a 

variable 

x^  is  sought  such  that 

8k 

< 0 

if 

*k  = 

XLk 

► • 

8k 

> 0 

if 

*k  = 

XUk  . 
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Thus  M0402  tends  to  waste  time  on  every  iteration  by  evaluating  partial 
derivatives  which  are  then  not  used  because  they  are  of  the  wrong  sign.  Even  if 
(5-2)  is  satisfied,  zigzagging  on  and  off  the  bound  tends  to  occur.  Convergence 
is  thereby  retarded  as  each  iteration  consists  of  only  a small  step.  The  poten- 
tial disadvantage  of  M21UNC's  strategy  is  that  unnecessary  effort  may  seem  to  be 
expended  minimising  on  the  reduced  subspace  yet  the  process  is  essential  to 
prevent  the  possibility  of  returning  to  that  same  set  of  variables  and  perhaps 

oscillating  indefinitely.  In  practice  M21UNC  employs  a 'diminishing  returns' 

T 

type  of  policy  whereby  variables  are  considered  for  release  when  g g < TOLG  and 
only  if  none  are  suitable  does  it  press  on  towards  the  termination  criterion 
| | g | | < TOLG  ( ie  g g < (TOLG)  ).  The  bounded  set  is  then  inspected  again  and  if 
a variable  satisfying  (5-2)  is  discovered  we  revert  to  the  first  criterion  and 
continue  with  the  enlarged  subspace  otherwise  M21UNC  is  left,  providing  condition 
(4-12  (ii))  is  satisfied,  and  the  current  best  point  is  deemed  to  be  the 
solution.  No  difficulties  have  yet  been  encountered  with  this  procedure  and  it 
of  course  saves  a considerable  number  of  function  evaluations  by  reducing  the 
effective  dimensionality  of  the  problem.  Indeed,  experience  with  practical 
problems  suggests  that  once  a variable  has  hit  a bound  it  stays  there  more  often 
than  not.  Notice  that  the  above  ideas  can  easily  be  extended  to  general  linear 
constraints  such  that  search  directions  are  projected  along  active  constraints 
and  the  constraints  are  held  until  a minimum  is  neared  when  the  signs  of  their 
Lagrange  multipliers,  are  inspected.  For  simple  bounds  X reduces  to  g (or 
*g)  • 

It  is  important  that  second  derivative  information  relating  to  variables 
on  bounds  is  deleted.  Thus  when  x^  reaches  a bound  the  elements  of  H (like- 
wise for  L and  D if  using  B ) are  reset  as  follows: 


= H 


£k 


= 0 


k ¥=  £ 


\k  = 


= 1 


(5-3) 


This  ensures  that  later  on  when  (5-2)  is  fulfilled  a function  decrease  will  be 
obtained  on  the  following  iteration  (because  d ° - g^) . Although  (5-3) 
implies  discarding  of  information  it  is  not  really  a waste  because  the  variable 
is  unlikely  to  be  freed  for  many  iterations  during  which  time  no  new  information 
regarding  the  dependence  of  the  function  on  that  variable  will  be  available. 
Therefore,  we  could  only  retain  values  from  just  before  the  bound  was  reached 
but  their  worth  when  resurrected  could  not  be  guaranteed.  This  is  not  quite 
true  of  M0402  because  g^  is  evaluated  throughout  the  time  that  x^  is  at  a 
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bound  so  that  the  relevant  quantities  in  updates  (3-24)  or  (3-26)  could  be 
specified.  However,  it  would  be  unwise  to  use  the  information  because  it  could 
lead  to  frequent  failure  of  test  (3-25)  even  with  exact  searches  because 
d^  = (-  Hg)^  does  not  hold  for  the  bound  variables.  In  other  words,  G*  might 
not  be  positive  definite  (eg  if  F depends  linearly  on  x^  , x*  = XL^  or 
XU^  necessarily).  It  only  need  be  positive  definite  in  the  subspace  of  the  free 
variables,  a result  which  is  readily  extended  to  general  linear  and  nonlinear 
constraints  such  that  the  Hessian  of  the  Lagrangian  function  at  a constrained 
minimum  has  only  to  be  positive  definite  in  the  subspace  tangent  to  the  active 
constraint  surface. 


5.2  An  example  and  a warning 
Minimise 

F (x)  = 100(x2  - x2)2  + (1  - x j ) 2 

subject  to  (5-4) 

0.2  $ x2  $ 1.0 


beginning  at  = (-  1.2,  1.0)  . 

The  unconstrained  solution  is  F*  = 0 at  x*  = (1,1)  but  the  bounds 
create  other  stationary  points.  Differentiating, 


G = 1 1200x2  - 400x2  + 2 - MOXj' 

1 - 400X]  200 


(a) 

X2 

0.2 

gives  g] 

= 0 

when 

Xj  = - 0.428,  - 0.026,  0.454 

(b) 

x2  = 

1.0 

gives  gj 

= 0 

when 

xj  = 1.0,-  0.995,-  0.005 

(a) 

(i) 

X1  = 

- 0.428  ; 

g2 

> 0 ; 

32F 

— -j  > 0 acceptable  constrained  minimum 

3x, 

(ii) 

X1  " 

- 0.026  ; 

g2 

> 0 ; 

32F 

— ■=■  < 0 unacceptable  maximum 

3Xj 

32F 

— 2 > 0 unacceptable  minimum; 

(iii) 

X1  “ 

0.454; 

82 

A 

O 

3X] 
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(b)  (i)  x 


3^F 

1.0  ; g = 0 ; — — > 0 global  minimum 

3x, 

2 

3^F 

(ii)  x.  = - 0.995  ; g > 0 ; — - > 0 unacceptable  minimum 

9xl 


32F 

(iii)  Xj  = - 0.005  ; g9  > 0 ; — j < 0 acceptable  constrained  maximum 

3Xj 


where  by  'unacceptable'  we  mean  that  the  bound  could  profitably  be  released 
(by  (5-2)  or  complementary  criteria  in  the  case  of  maxima). 

Thus  a minimisation  can  (and  both  M0402  and  M21UNC  do)  terminate  properly 

at  (-0.428,  0.2)  rather  than  (1,1).  Now  let  us  consider  the  Hessian  matrix, 

G . At  (-0.428,0.2)  the  complete  matrix  is  / 142  171. 2\  which,  having  a 

\ 1 7 1 .2  200  / 

negative  determinant,  is  not  positive  definite.  Hence  updating  the  full  B or 
H approximation  to  G or  G 1 while  maintaining  the  positive  definite  require- 
ment will  prevent  the  solution  from  being  attained.  However,  if  B or  H is 
adjusted  as  in  (5-3)  when  the  bound  is  first  encountered  then  we  would  have 

B*  = (142  OS  which  is  positive  definite.  Note  that  in  this  example,  once 

\ 0 1/ 

x_  has  hit  its  lower  bound,  whether  or  not  G is  positive  definite  is  deter- 
7 .22 

mined  by  the  sign  of  3 F/3Xj  ; hence  the  interpretations  given  above. 

One  further  point  mentioned  in  section  4.3  is  that  the  specified  bounds 
are  not  strictly  observed  in  that  when  evaluating  derivatives  numerically  a bound 
can  be  exceeded  by  an  amount  Ax  . Usually,  this  is  of  no  consequence  but  it  is 
not  difficult  to  envisage  a disastrous  situation  ( eg  taking  x^  where  x >,  0 
supposedly).  To  guard  against  this,  a facility  is  provided  whereby  the  user  can 
check  for  such  violation  and  refuse  to  calculate  F in  which  case  M0402  and 
M21UNC  make  do  with  the  appropriate  single  difference. 

5 . 3 The  complete  algorithm  of  subroutine  UNCMIN 

We  are  now  in  a position  to  sketch  the  complete  algorithm  (A-4)  which 
M21UNC  implements  via  subroutine  UNCMIN  (see  also  Fig  2). 

(1)  Initialise  L,  D . Calculate  F^  . Identify  any  variables  on 
bounds.  Set  6 * TOLG  , i = 0 . 

(2)  Compute  g^  by  central  differences. 

T T 

(3)  Co-jpute  g g . If  g g > 6 , go  to  (10). 
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(4)  If  forward  differences  are  in  use,  convert  g to  central  differences 
and  go  to  ( 3) . 

(5)  Central  differences  already  so  test  for  release  of  bounds.  If  a 
suitable  one  is  found  (5-2)  , set  6 = TOLG  and  go  to  (10). 

(6)  No  bound  variable  can  be  released  so  if  the  line  search  has  failed 
go  to  (16),  otherwise 

(7)  if  6 = TOLG2  go  to  (9) . 

(8)  Set  5 = TOLG2.  If  gTg  > 6 , go  to  (10). 

(9)  Terminate  with  IEXIT  = 1 if  | | B~ 1 g | | ^ < TOLX  . 

(10)  Output  i , NF  , F , x , g if  required.  Set  i = i + 1 . 

-1  T T . (i) 

(11)  Compute  d(=  B g)  , d d,d  g . Estimate  a from  (6-4). 

(12)  Perform  a line  search.  If  unsuccessful,  go  to  (4). 

(13)  If  a new  bound  hit,  reset  appropriate  elements  of  L and  D to 
0 and  1 respectively. 

(14)  If  using  central  differences,  check  whether  forward  differences  can 
be  used  henceforth  (a^  > ctj  : see  (4-9)). 

(15)  Compute  . Update  L and  D if  possible.  Go  to  (3). 

(16)  Perform  local  search.  If  no  lower  point  found,  terminate  with 
IEXIT  = 4 , otherwise  go  to  (2). 

If  analytic  partial  derivatives  are  available  then  steps  (4)  and  (14)  will 
be  omitted  and  references  to  central  differences  are  changed  to  read  'analytic 
expressions' . 

The  only  aspects  of  the  above  algorithm  which  have  not  yet  been  described 
are  the  searches  (12)  and  (16).  They  are  therefore  the  subject  of  the  next 
section. 


6 THE  LINE  SEARCH  AND  THE  LOCAL  SEARCH 
6 . I The  one-dimensional  search 

The  notion  of  a line  search  was  introduced  in  section  3 and  refers  to  the 
process  of  selecting  in  the  quasi-Newton  iteration 


x(i+,) 


a(i)H(i)g(i)  . 
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We  recall  that  the  rank-1  matrix  updating  formula  does  not  require  a search 


as  such  but  we  still  want  F^1+*^  < F^ 


and  to  be  positive  definite. 


The  latter  proves  a stumbling  block  for  the  rank- I correction.  Certain  rank-2 
updates  have  the  property  that  they  can  always  be  applied  with  safety  (except  for 
round-off  error)  if  the  line  search  is  exact  but  it  is  customary  to  satisfy  a 

. . . t 

less  stringent  condition  and  hope  that  P q > 0 is  still  fulfilled. 

The  accuracy  of  the  search  can  be  specified  in  M21UNC  (which  uses  the  NPL 
subroutine  LINSCH)  but  not  in  M0402.  Both  subroutines  basically  use  quadratic 
interpolation  to  estimate  the  position  of  the  first  minimum  along  d . One  needs 


a procedure  for  determining  the  initial  step  a 


(i) 
0 


after  which  the  minimum  can 

(i) 


be  interpolated  or  extrapolated,  first  by  making  use  of  the  gradient  at  x 


3F 

9a 


a=0 


(i)Tj(i) 


Subsequent  quadratic  fits  are  made  using  three  function  values.  M0402  retains 
three  points  such  that  the  minimum  is  bracketed  whereas  LINSCH  tends  to  keep 
the  three  lowest  points  and  therefore  has  to  include  safeguards  to  prevent 
absurd  extrapolations.  In  neither  routine  are  partial  derivatives  required  until 
the  line  search  has  been  completed. 


An  important  point  is  that  F 

g 

as  it  stands  (Fletcher  ).  Instead, 


<i+1>  < F(i)  i 
a 'sufficient' 


s not  a satisfactory  condition 
decrease  is  demanded 


F<i+1> 


,(i) 


(i)  (i)T.(i) 
ya  g d 


(6-1) 


where  0 < y < [ . LINSCH  sets  y * 0.0001  . The  accuracy  of  the  line  search  is 
determined  by  n (0  q < 1)  which  causes  to  be  chosen  so  that 


(i+l) 


o 


* 


ng 


(i)Td(i) 


(6-2) 


where  o is  the  largest  value  of  a less  than  and  F^+l^  is  the  lowest 

function  value  found.  Thus  q = 0 specifies  an  exact  line  search  while 
q = 0.99  would  imply  a slack  search.  Having  chosen  in  this  way  it  is 

conceivable  that  (6-1)  will  be  violated  in  which  case  is  successively 

halved  until  (6-1)  holds.  If  this  happens  then  the  final  accepted  point  will 
not  necessarily  be  the  lowest  point  found. 


132 


37 


As  was  discussed  in  section  4.3,  it  must  also  be  ensured  that  the  step 
p^(=  a^d^)  is  not  unrealistic  given  the  accuracy  to  which  F can  be 
calculated.  If  no  satisfactory  point  is  found  with  a >,  a line  search 
failure  is  recorded.  This  normally  means  that  the  error  in  g is  such  that 
the  resulting  d is  an  uphill  direction.  It  will  be  noticed  that  algorithm  (A-4) 
does  not  implement  the  'soft'  lower  bound,  such  that  if  no  a >, 

can  be  found  we  switch  to  central  differences.  No  obvious  problems  have  resulted 
from  this  decision  and  it  has  in  general  led  to  greater  efficiency. 

The  one  dimensional  search  thus  needs  to  be  a rather  sophisticated 
routine  if  it  is  to  make  the  most  efficient  use  of  function  values.  A saving 
of  two  or  three  function  calls  per  line  search  can  result  in  a considerable 
reduction  in  overall  running  time  since  objective  function  evaluation  is  the 
dominant  process  in  most  practical  problems.  The  choice  of  n is  one  best 
left  to  the  user  for  it  depends  on  the  individual  problem  and  on  how  derivatives 
are  calculated.  It  is  recommended  that  one  starts  with  a tight  search  and  then 
explores  the  effects  of  relaxing  it  (eg  n = 0.01,  0.1,  0.5,  0.9). 

However  efficient  the  line  search  procedure  may  be  it  will  be  enhanced  by 
the  provision  of  a good  estimate,  . First,  the  maximum  stop  permitted 

by  the  bounds  on  the  variables  is  calculated 


We  then  set 


m 


min 

k 


H 


XLk  - 4° 


dnr 

k dk 


if  d^l)  > 0 
k 


if  d^  < 0 
k 


y • (6-3) 


(i)  ... 

al  = min  1 , 


(,  2AF  \ 

Kv 


(6-4) 


As  the  minimum  is  neared  and  B ->  G it  is  expected  that  a -+•  1 . The  second 
term  on  the  right  of  (6-4)  is  based  on  a local  quadratic  approximation  to  F . 
After  the  first  iteration  we  set  AF  = F^  - F^  ^ . If  F were  locally 
quadratic  we  would  have 

F (x  + p)  = F(x)  + pTg  + JpTGp  (6-5) 

and 

g (x  + p)  - g (x)  + Gp  . (6-6) 
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With  an  exact  search  g(x  + p)Tp  = gTp  + pTGp  = 0 . Therefore  in  (6-5) 

T 

F(x  + p)  = F(x)  + ig  p . But  p = ad  , giving 


_ 2 (F  (x  + p)  - F (x)  ) 

T, 
g d 


(6-7) 


In  (6-4),  AF  is  used  because  F ^ is,  of  course,  unknown  at  the  start  of 
the  iteration. 


M0402  operates  slightly  differently.  It  employs  the  user's  tolerance 
parameter,  E(<  1)  , to  define  a minimum  first  step  and  endeavours  to  choose 

0(>  to  restrict  the  change  in  x 


then 


a 


min 


0.  IE 

run: 


(6-8) 

(6-9) 


max  (a,  a2) 


(6-10) 


Steps  smaller  than  are  permitted  but  after  two  such  consecutive  steps, 
M0402  sets  d = - g and  terminates  if  a < again  occurs  (see  the  flow 
chart,  Fig  1). 


An  idea  of  the  intended  accuracy  of  M0402's  line  search  can  be  obtained 
from  the  original  matrix  update  criterion  which  was  much  more  restrictive  than 


(3-25). 


. (i+l)T  . 
I g P I 


* - O.lg 


(i) 


T 

P • 


(6-11) 


This  has  now  been  relaxed  to  allow  more  frequent  updating.  Moreover,  the 
original  program  reset  H to  I after  every  n updates.  The  current  version 
does  not  discard  H arbitrarily  and  is  consequently  more  efficient  and  in 
practice  no  less  robust. 


6 . 2 The  local  search 

The  idea  of  the  local  search  is  due  to  Gill  and  Murray  and  is  intended  as 
a means  oi  ronfirming  a solution  or  for  moving  away  from  a saddle  point.  M0402 
does  not  have  such  provision  and  M21UNC  only  activates  it  in  the  event  of  a line 
search  failure  (see  algorithm  (A-4)  at  the  end  of  section  5).  The  procedure  as 
implemented  in  M21UNC  is  as  follows. 
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(1)  Take  an  arbitrary  step  away  from  x to  x' . 

(2)  Choose  a direction  orthogonal  to  (x'  - x)  and  take  a step  to  find 
which  way  is  downhill. 

(3)  Perform  an  exact  line  search  along  (x'  - x)  to  get  z . 

(4)  Perform  an  exact  line  search  along  (z  - x)  to  get  x' ' . 

The  initial  step  (1)  is  of  the  same  magnitude  for  each  free  variable  and  is 

directed  towards  the  further  bound.  The  step  (2)  is  of  the  same  magnitude  but 

alternate  components  are  reversed  to  obtain  orthogonality.  The  last  component 

must  then  be  set  to  zero  if  there  is  an  odd  number  of  free  variables,  eg  if 

step  (1)  were  (\p,  \p,  -ip,  ip,  -ip,  -ip,  ~ip)  then  step  (2)  would  be 

(-((;,  \p,  ip,  \p , ijj,  -ip,  0)  . Both  line  searches  can  then  be  performed  without 

violating  any  bounds  because  a is  determined  using  (6-3)  with  d.  = 1 and  ip 

m k 

is  then  set  by 

ip  = min  (TOLX.Ja  ) (6-12) 

m 

where  TOLX  is  the  desired  accuracy  in  the  x^  . If  the  second  search  (4) 
terminates  very  near  x then  this  is  accepted  as  the  best  that  M21UNC  can  do 
and  the  user  is  left  to  ponder  why  his  specified  termination  criteria  have  not 
been  achieved.  Otherwise,  the  minimisation  continues  from  x' ' . An  especial 
virtue  of  the  local  search  is  that  no  derivative  information  is  required. 

7 M0402  OR  M21UNC? 

M0402  is  five  years  older  than  M21UNC,  It  has  therefore  been  applied 
more  widely  and  has  proved  to  be  a reliable  program.  Also,  it  was  written 
entirely  in  Mathematics  and  Computation  Department  and  hence  may  be  used  by 
people  connected  with  but  outside  RAE.  M21UNC  cannot  be  distributed  elsewhere 
because  of  the  NPL  subroutines  within  it.  It  is  also  a much  larger  program 
physically  but  this  should  be  immaterial  in  most  cases.  The  main  virtues  of 
M21UNC  are  its  improved  efficiency  and  sensitivity  information.  We  have  not 
found  any  problem  on  which  M0402  performs  better  ( ie  lower  number  of  function 
evaluations,  NFE)  than  M21UNC  and  it  is  necessarily  much  less  helpful  for  post- 
optimal  sensitivity  analysis  (see  section  8)  and  in  fact  generally  less  flexible. 
On  the  other  hand,  this  makes  it  relatively  simple  to  use  whereas  M21UNC  does 
invite  the  user  to  experiment  with  the  data  which  may  not  always  be  worthwhile. 

The  data  for  the  two  programs  (see  Appendices  B and  C for  more  details) 
are  naturally  similar  but  it  is  left  to  the  user  to  arrange  its  input.  Both 
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need  x^,  XL,  XU  and  XS  , an  upper  limit  to  NFE,  a print  control  parameter, 
MODE  (of  operation).  Ax  (finite  difference  interval) (if  applicable)  and 
tolerance(s)  (E  for  M0402  see  section  6.1;  TOLG  and  TOLX  for  M21UNC).  In 
addition  M21UNC  needs  ETA,  the  line  search  accuracy.  The  user  has  to  provide  a 
subroutine  CALCF  to  evaluate  F(x)  and,  if  analytic  derivatives  are  available, 
subroutine  DERIV  is  also  needed. 

Regarding  performance,  we  are  not  presenting  any  specific  figures  because 
test  problems  are  not  very  representative  of  real  life  although  they  are  helpful 
during  program  development.  Standard  test  problems  tend  to  be  readily  differen- 
tiable and  the  accuracy  to  which  F is  calculated  is  0(e)  because  the 

m 

functions  are  fairly  trivial.  It  is  then  not  surprising  that  finite  differences 
work  almost  as  well  as  analytic  expressions.  Furthermore,  apparently  spectacular 
performance  on  test  problems  claimed  in  the  literature  is  often  merely  a conse- 
quence of  the  way  in  which  function  calls  are  counted,  eg  an  analytic  gradient 
evaluation  counted  as  one  or  even  ignored  entirely  if  g is  always  calculated 
with  F (£e  a cubic  line  search).  Again,  the  accuracy  of  the  line  search 
affects  NFE  and  this  may  well  have  been  optimised  or  if  a slack  search  is  used 
and  gradient  calls  ignored,  NFE  will  probably  be  particularly  low.  However, 
slack  searches  are  often  extremely  expensive  when  finite  differences  are 
necessary  and  this  is  especially  true  when  minimising  penalty  functions  where 
F(a)  is  less  likely  to  be  unimodal  because  of  the  influence  of  the  constraint 
terms.  Hence  the  general  advice  given  in  section  6.1  to  initially  use  a tight 
line  search  when  attempting  a new  problem. 

Thus  there  seems  little  point  in  trying  to  compare  performance  with  other 
published  algorithms  but  comparison  between  M0402  and  M21UNC  is  realistic 
because  we  know  that  they  have  been  applied  in  the  same  way.  When  allowance  is 
made  for  differences  in  reckoning  NFE,  M2iUNC  seems  to  be  as  efficient  as  any 
program  using  quadratic  line  searches  described  in  the  literature. 

Both  M0402  and  M21UNC  form  the  basis  of  nonlinearly  constrained  optimisa- 
tion programs.  M0402  is  used  in  the  SUMT  program,  M0414K  which  uses  a barrier 

function  to  tackle  inequality  constrained  problems  while  M21UNC  appears  in 

2 . 20 
M21IPF  (Purcell  ) which  implements  a variation  of  Fletcher's  'ideal'  penalty 

function  for  inequality  and  equality  constraints. 

8 SENSITIVITY  ANALYSIS 

Quite  often  the  user  will  want  to  know  how  sensitive  is  the  solution  pro- 
vided by  a minimisation  routine.  For  example,  he  might  like  to  know  how  flat 


m—  I 
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the  optimum  is  or  what  would  happen  if  an  active  bound  were  relaxed  slightly  and 
F reoptimised  while  if  fitting  a curve  through  a set  of  points  the  statistical 
errors  associated  with  the  optimum  parameters  would  be  appreciated. 

M0402  returns  x , F , g and  H but  does  not  provide  any  other  informa- 
tion. It  is  therefore  up  to  the  user  to  form  the  relevant  expressions  below 
and  to  interpret  them.  Such  a policy  at  least  requires  him  to  make  a conscious 
effort  to  extract  more  details  which  is  perhaps  better  than  merely  giving  him 
figures  which  he  may  be  tempted  to  employ  without  properly  considering  their 
worth.  This  will  probably  be  particularly  true  of  problems  in  which  the  calcula- 
tion of  F is  of  low  accuracy  or  when  finite  difference  derivatives  have  been 
used  (the  former  usually  implies  the  latter  anyway). 

Now,  M0402  does  not  aim  for  any  specific  accuracy  in  g or  x so  they 
must  be  checked  first,  g can  be  output  immediately  while,  denoting  the  true 
solution  by  x*  , we  have  for  the  free  variables 

x*  - x - - Hi  (8-1) 


and  conditions  (5-2)  should  not  hold  for  any  bound  variable.  Note  also  that  the 
magnitude  of  derivatives  with  respect  to  bound  variables  indicates  the  degree 
to  which  that  bound  is  restraining  F*  ie  relaxing  bound  k by  6x^  will  cause 
F*  to  decrease  by  58  g*6x^  • 

The  local  curvature  of  F can  be  explored  only  with  M21UNC,  using  either 

2 2 

of  the  columns  headed  'Estimates  of  3 F/3x  . The  entries  in  these  columns 

A A 

should  be  similar  for  the  free  variables.  One  is  obtained  from  L and  D 

while  the  other  is  found  from  (4-8)  (if  the  finite  difference  version  of  the 

2 2 

program  is  being  used).  3 F/3x  for  the  bound  variables  can  also  be  calculated 
from  (4-8)  but  1.0  must  necessarily  appear  in  the  other  column.  However,  if 
CALCF  prohibits  the  overstepping  of  the  bound  then  the  entry  in  the  'Differences' 
column  is  set  to  1.0E20. 


Writing  (0,  ...0,  6x^,  0...0)  = 

gives 

F (x  + (0,6xk)T)  - 


(0,  6x^) , a Taylor  expansion  about 
F(x)  + fix^  + • 


x 

(8-2) 


While  (8-2)  gives  an  idea  of  the  degradation  in 
fix^  it  is  not  easy  to  estimate  the  rise  in  F* 
variables  if  the  system  is  reoptimised  keeping 


F on  changing  a variable  by 
or  the  changes  in  the  other 
x^  fixed  at  + 6x^  . 
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The  difficulty  is  created  by  the  presence  of  bounds.  To  be  more  precise,  although 
we  could  with  some  effort  estimate  the  changes  in  the  free  variables  we  have  no 
way  of  deciding  whether  the  bounds  active  at  x*  would  remain  active  because  the 
necessary  second  derivative  information  is  not  available.  Formulae  for  estimating 
the  new  optimum  and  the  corresponding  increase  in  F*  are  derived  in  Appendix  A 
assuming  x = x*  and  that  the  set  of  free  variables  at  the  perturbed  solution 
is  the  same  as  at  x*  • AH  things  considered,  the  user  will  doubtless  find  it 

A t 

more  convenient  and  meaningful  to  rerun  the  program  starting  at  (x  + (0,6x^)  ) 
and  feeding  in  (H  or)  L and  D as  the  initial  (inverse)  Hessian.  Such  a 
computer  run  should  be  relatively  short  because  of  the  good  starting  values. 

While  on  this  subject,  it  should  be  stressed  that  in  principle  any  positive 
definite  matrix  will  do  to  start  a minimisation.  The  unit  matrix  is  chosen  by 
default  when  no  other  estimate  is  available  because  it  is  well-conditioned  and 
directs  the  solution  path  down  the  first  order  steepest  descent  direction,  but 
clearly  for  a general  problem  it  will  be  far  removed  from  let  alone  G*  . 

The  finite  difference  version  of  M21UNC  actually  attempts  to  inprove  d^^  by 
using  (4-8)  to  generate  a better  scaled  diagonal  approximation  to  G^  . 

If  the  original  problem  was  parameter  fitting  by  nonlinear  least  squares 
then  the  covariance  matrix  will  be  of  interest.  H*  is  this  matrix  except  for 
a multiplying  factor.  For  example,  find  x = (x, , ...»  x ) to  minimise 

m 

F (x)  = ) w (y  - f.(x))2  ; m > n . 

j=i 

Then  an  estimate  of  the  residual  variance  of  the  best  fit  is 

2 F 

a = 

m - n 

2 -1 

and  the  covariance  matrix  is  2o  (G*) 

• • A " | 

In  substituting  H or  B for  (G*)  the  user  should  remember  that  the 

degree  of  similarity  is  uncertain.  However,  a higher  level  of  confidence  is 

warranted  if  analytic  derivatives  have  been  used  and  | |g| | is  very  small.  With 

M2IUNC,  subroutine  HESINV  has  to  be  called  by  the  user's  program  in  order  to 
. *“1 

obtain  B explicitly  which  it  does  by  solving  the  n simple  systems  of  n 
linear  equations 
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LDL1^  = Ik  (8-3) 

where  I,  is  the  kth  column  of  the  unit  matrix 

"k  --1  -1 
and  h^  becomes  the  kth  column  of  B . Because  of  the  symmetry  of  B 

the  kth  system  need  only  contain  (n  + 1 - k)  equations. 

The  various  features  of  M0402  and  M21UNC  described  in  this  section  are 
summarised  in  Table  1 . 

9 CONCLUSIONS 

We  have  attempted  to  explain  the  development  of  quasi-Newton  (variable 
metric)  methods  for  the  unconstrained  optimisation  (minimisation)  of  nonlinear 
(objective)  functions  and  to  describe  in  some  detail  the  construction  of  a 
FORTRAN  computer  program,  M21UNC  incorporating  current  ideas  on  efficiency, 
stability  and  accuracy. 

The  corresponding  aspects  of  an  older  and  no  less  reliable  program, 

M0402,  were  described  simultaneously. 

Although  some  ideas  for  enhancing  M21UNC  have  been  suggested  in  this  Report 
and  are  to  be  investigated,  it  is  unlikely  that  any  dramatic  improvement  will 
result.  Our  primary  interest  is  currently  in  the  area  of  nonlinear ly  constrained 
optimisation  where  we  feel  that  much  remains  to  be  done  despite  the  rapid 
advances  of  recent  years.  Also  useful  would  be  a program  specially  for  nonlinear 
least  squares  fitting  which  was  more  efficient  than  the  general  purpose  programs 
described  here  but  equally  robust. 

Finally,  we  restate  our  belief  that  M0402  and  M21UNC  are  very  efficient 
minimisers  of  nonlinear  functions  of  several  variables  subject  only  to  upper  and 
lower  bounds.  In  common  with  every  other  implementation  they  do  not  claim  to  be 
able  to  locate  the  global  minimum  of  a non-convex  function.  If  the  user  is 
worried  that  many  local  minima  exist  he  should  try  rerunning  his  program  from  a 
different  x^  . Furthermore,  it  is  recognised  that  the  continuity  requirements 
n the  method  could  rule  out  a substantial  class  of  problems.  At  the  other 
extreme,  problems  in  which  second  derivatives  can  be  calculated  analytically  are, 
in  our  experience,  sufficiently  rare  that  a separate  program  capitalising  on  the 
extra  information  is  not  justified. 
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Appendix  A 

PERTURBING  THE  MINIMUM  AND  REOPTIMISING 
Consider  the  unconstrained  problem 

Minimise  F(x)  ; x = (x, ,x„ , . . . ,x  ) . 

Let  x*  be  a solution,  ie  g(x*)  = 0 . 

Then,  let  one  of  the  variables,  say  xn  be  perturbed  by  fix^  . To 
second  order. 


(A- 1 ) 


F'  = F^x*  + (0,<Sxn)Tj  = F*  + |(6xn)‘ 


B* 

nn 


(A-2) 


where  F*  = F(x*) 
and  B*  = G*  = 


3*F 


kJt  k*.  3x,  3x 


x* 


Now,  holding  the  perturbed  variable  constant,  reoptimise  F by  varying 
the  other  (n  - 1)  parameters.  Let  the  new  optimum  be  x**  and  define 
6x*  = x**  - x*  . 


Then,  to  second  order. 


and 


Writing 


B* 


F** 

= f* 

g**  = 

g*  + B 

/ ~ 

= / B* 

b \ 

L 

. .'.in. . 

1 T 

i 

l b 

' B* 

\-n 

l nn/ 

and  5x*  = 


fix* 

6x 


(A- 3) 


(A-4) 


(A-5) 


gives,  with  (A-4), 

Therefore , 

From  (A-6) 


B*6x*  + 6x  b =0 
n-n 


B*6x*  = (0,bT6x*  + 6x  B*  )T 

- -n — n nn 


(A-6) 


(A-7) 


fix* 


- 6x  (B*)  'b^ 
n n 


(A-8) 


46 


Appendix  A 


(A- 7)  and  (A-8)  in  (A-3)  give 

F**  = F*  + J(<5jc  ) 2 (b*  " bT(B*)"'b  ) . (A-9) 

n y nn  -n  -n j 

Relations  (A-8)  are  therefore  estimates  of  the  changes  in  the  other  (n  - 1)  vari- 
ables when  one  variable  is  perturbed  and  the  system  reoptimised  while  expression 
(A-9)  estimates  the  function  value  at  the  new  minimum.  We  vould  therefore  expect 
the  following  inequality  to  hold 

F*  < F**  < F'  . (A- 10) 


F*  < F'  and  F**  < F'  follow  from  (A-2)  and  (A-9)  respectively  because  principal 
sub-matrices  of  a positive  definite  matrix  are  themselves  positive  definite. 

To  confirm  F*  < F**  requires  the  following  proposition  (see  (A-9))  to  be 
verified. 


If  an  arbitrary  positive  definite  matrix  B*  is  partitioned  as  in  (A-5) 


then 


nn 


T 

> b (B*) 
-n 


(A- 1 1 ) 


Proof 

Since  B*  is  positive  definite, 


T * 

u B u > 


Let  u = (v,a) 

where  v is  an  arbitrary  (n  - 1)  vector 
and  a is  an  arbitrary  scalar. 


for  all  non-zero  u . 

(A- 12) 


Then  ( A- 1 2 ) be  come  s 

vTB*v  + 2abTv  + a2B  > 0 . 
-n-  nn 


Completing  the  square  yields 

(v  + a(B*)_1bnJTB*|y  + a(i*)~'bj  + a2^B*n  - b^(B*f 'bj  > 0 . (A-13) 

~ - j 

Now  the  choice  v = - a(B*)  b zeroises  the  first  term.  Therefore, 

— -n 

property  (A- 12)  can  only  hold  if  the  second  term  in  (A-13)  is  always  positive, 
which  establishes  the  result  (A— 11)  and  hence  (A-10). 
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Clearly,  a lot  of  extra  computational  effort  is  involved  in  calculating 
6x*  and  F**  particularly  as  B*  is  not  explicitly  required  for  any  other 
purpose  let  alone  inverses  of  submatrices  of  B*  . In  practice  therefore,  it  is 
better  to  rerun  the  minimisation  program  as  described  in  section  8. 
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Appendix  B 
USER  GUIDE  TO  M0402 

B. 1 Purpose 

- To  solve  the  problem 

Minimise  F(x) 

subject  to 

XI^  « x^.  * XUk  k = 1 ,2,...  ,n 

where  F is  a differentiable  nonlinear  function  of  the  variables  x . Finite 

difference  approximations  to  3F/9x  are  constructed  when  necessary  if  analytic 

expressions  are  not  available. 

B.2  Argument  list 

CALL  M0402  (F,G,N,X,XL,XU,XS,E,MT,IP,NH,H,WR,WD,MODE) 

F A REAL  variable  into  which  the  user  may  put  the  value  of  the  objective 
function  corresponding  to  the  initial  X (see  MODE  below).  It  will 
contain  F(x)  on  exit. 

G A REAL  array  of  N elements  to  hold  the  vector  of  partial  derivatives 

9F/9x  . G may  be  set  on  entry  (see  MODE)  and  will  contain  9F/9x  on  exit. 

N The  number  of  variables. 

X A REAL  array  of  N elements.  An  initial  approximation  (unsealed)  must  be 

set  on  entry  to  M0402  and  the  best  values  found  will  be  returned  (unsealed) 
on  exit. 

XL  A REAL  array  of  N elements  defining  lower  bounds  on  the  x^  . 

XU  A REAL  array  of  N elements  defining  upper  bounds  on  the  x^  . 

XS  A REAL  array  of  N elements  containing  positive  scale  factors  for  the 

variables,  ie  XS^  ~ 

E A REAL  variable  holding  the  accuracy  parameter.  M0402  terminates  with 

MODE  = 1 (see  below)  after  three  successive  iterations  in  which  the  largest 
scaled  change  in  the  variables  is  <0.1E  . E ■ 0.01  is  recommended. 

MT  Maximum  number  of  calls  to  CALCF  permitted.  On  exit  the  number  of  calls 
actually  made  is  stored  in  MT. 
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IP  Print  control  parameter.  The  intermediate  output  subroutine,  REP  is 

called  every  IP  calls  to  CALCF  (IP  > 0)  or  every  -IP  iterations  (IP  < 0) . 
If  no  printing  is  required  set  IP  = MT  , otherwise  IP  = - 1 is 
recommended.  On  exit,  IP  indicates  the  number  of  iterations  performed 
(i  on  the  flow  chart;  see  Fig  1). 

NH  Must  be  set  to  N*(N  + l)/2  . 

H A REAL  array  of  size  NH  used  for  storing  the  lower  triangle  by  columns 

of  the  inverse  Hessian  matrix.  An  initial  positive  definite  approximation 

may  be  supplied,  otherwise  put  H ( 1 ) = 0.0  causing  M0402  to  set  the  unit 

matrix  in  H . The  current  estimate  of  G * is  returned  on  exit. 

WR  A REAL  array  of  N elements  used  as  working  space. 

WD  A DOUBLE  PRECISION  array  of  3N  elements  used  as  working  space  for  the 
accumulation  of  matrix-vector  products. 

MODE  On  entry;  MODE  = 1 indicates  that  only  X is  supplied. 

= 2 indicates  that  F(X)  is  also  supplied. 

= 3 indicates  that  G(X)  is  also  supplied. 

On  exit;  MODE  = 1 convergence  criteria  satisfied  (re  no  significant 
further  progress  can  be  made) . 

= 2 F could  not  be  calculated  at  the  point,  X , supplied 
by  the  user. 

= 3 An  acceptably  low  point  has  been  reached. 

= 4 The  maximum  number  of  calls  allowed  to  CALCF  has  been 
reached. 

B. 3 User  subroutines 

The  calculation  of  F(x)  must  be  carried  out  in  a subroutine  headed 

SUBROUTINE  CALCF  (N,X,F,IC) 

DIMENSION  X(N)  . 

On  entry,  IC  indicates  whether  F is  required  for  the  line  search  (IC  = 1)  or 
. 3F 

for  the  estimation  of  by  central  differencing.  On  exit,  N and  X 

3x(IC-3) 

must  be  unchanged  but  the  user  can  set  IC  = 2 if  F cannot  be  calculated  or 
(if  IC  ■ 1)  IC  = 3 to  indicate  that  the  value  of  F just  calculated  is  low 
enough  to  be  an  adequate  solution  of  the  problem.  IC  should  not  be  altered  for 
any  other  reasons.  The  user  should  so  scale  his  objective  function  that  the 
F value  returned  by  CALCF  is  of  order  unity. 
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If,  on  the  other  hand,  analytic  derivatives  are  available,  CALCF  can  only 
be  called  with  IC  = 1 and  the  user  must  supply  subroutine  DERIV  to  evaluate  G . 

SUBROUTINE  DERIV  (N,X,F,G) 

DIMENSION  X(N) , G(N) 

N,X  and  F should  not  be  altered.  F is  included  in  the  arguments  in  case  it 
shortens  the  calculation  of  derivatives. 

B. 4 COMMON  area 

The  subroutine  which  calls  M0402  must  include  the  statement 


COMMON / MO 402A/ DELTA 


where  DELTA  is  the  scaled  differencing  interval  (Ax(>0))  for  every  variable  and 
should  be  a data  item  so  that  a suitable  value  can  be  discovered  by  experiment. 
DELTA  in  the  range  0.001  to  0.00001  is  suggested  (see  section  4). 

When  analytic  derivatives  are  provided  via  subroutine  DERIV,  the  user 
should  set  DELTA  =0.0  . 


B.5  Other  subroutines 

M0402  calls  CALCG  and  REP.  CALCG  either  evaluates  derivatives  numerically 
(by  calling  CALCF  2n  times)  or,  if  DELTA  =0  it  calls  DERIV.  The  latter 
course  is  preferable  in  every  way  although  the  analytic  expressions  should 
initially  be  checked  by  comparison  with  the  finite  difference  version.  For  each 
call  of  DERIV,  n is  added  to  the  count  of  function  calls. 


The  calling  of  REP  is  governed  by  the  parameter  IP.  It  outputs  to 
channel  2 in  the  following  format  which  the  user  may  alter  to  suit. 


IT  NF 
F 

X (K) ,K= 1 ,N 
G(K) ,K= 1 ,N 


iteration  number  (i);  number  of  function  evaluations  so  far. 
the  objective  function  F(x^). 
the  best  (lowest)  point  so  far  found, 
partial  derivatives  (3F/9x)^\ 


M0402  and  CALCG  do  not  produce  any  output . 


B.6  General 

(a)  M0402  was  written  by  Mr  Piggott  of  Mathematics  and  Computation 

Department,  RAE  with  small  modifications  by  the  present  author.  It  is  a 
relatively  small  program  and  tests  on  many  practical  problems  have  verified 
its  robustness. 
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(b)  Switching  between  numeric  and  analytic  derivatives  merely  involves 
changing  DELTA. 

(c)  There  is  no  restriction  on  problem  size  because  no  arrays  are  explicitly 

dimensioned.  In  general  one  might  expect  NFE,  the  total  number  of 

2 

calls  to  CALCF  required  to  increase  at  least  as  fast  as  n 

(d)  For  sensitivity  analysis  see  section  8.  Regarding  accuracy,  upon  a 
normal  exit  (MODE  = 1),  the  user  should  check  that  the  elements  of 
array  G are  small  («1)  for  all  the  free  variables  and  that 


G < 0 
k 


if 


\ 


XU, 


G > 0 
k 


if  *k  = XLk 


If  not,  X is  not  a true  minimum  and  a reason  should  be  sought  for 
the  premature  exit  from  M0402. 

(e)  M0402  can  be  transported  outside  RAE  for  associated  applications  and 

implementing  it  on  non-ICL  1900  computers  should  not  present  any 
difficulties. 
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USER  GUIDE  TO  M21UNC 


C.  1 Purpose 

To  solve  the  problem 

subject  to 


Minimise  F(x) 


XLk  * *k  « XUk  k 


= 1,2,. 


. ,n 


where  F is  a differentiable  nonlinear  function  of  the  variables  x . There  are 
two  versions  of  the  program,  one  for  when  partial  derivatives  can  be  expressed 
analytically  while  the  other  uses  finite  difference  approximations.  The  minimisa- 
tion algorithm  (A-4)  (see  section  5.3)  is  implemented  in  UNCMIN  with  M21UNC  merely 
acting  as  an  interface  with  the  user's  calling  segment. 

C. 2 Argument  list 

CALL  M2IUNC  (N,X,F,G,XL,XU,XS,JBD,HD,NN,HL,NW,W,HH,TOLX,TOLG,MAXFN,IPRINT, 
ETA, MODE, IEXIT) 

N the  number  of  variables  (^2) . 

X A REAL  array  of  size  N containing  initial  values  for  x . The  solution 

x will  be  returned  in  X . 

F A REAL  variable  holding  the  objective  function  value.  F(X)  must  be 
preset  when  MODE  =2  or  4.  On  exit,  F contains  F(x). 

G A REAL  array  of  size  N for  3F/3x  . G must  be  preset  when  MODE  = 2 


XL 

XU 

xs 

JBD 


4.  On  exit,  G contains  (3F/3x)  for  the  free  variables  and  G^  = 0.0 


or 

for 


the  bound  variables  (see  also  W below) . 

A REAL  array  of  N elements  containing  lower  bounds  on  the  x^. 

A REAL  array  of  N elements  containing  upper  bounds  on  the  x^. 

A REAL  array  of  N elements  containing  positive  scale  factors  for  the 

Xj^  such  that  XSfc  ~ |x^|  . 

An  INTEGER  array  of  N elements  indicating  the  state  of  each  variable  thus 


JBD(K) 


1 


= ] 


XLk  < \ < 


XU, 


= XL, 


XU, 


\ “~k 

JBD  need  only  be  preset  correctly  when  MODE 


1 
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HD  A REAL  array  of  size  N used  to  hold  the  diagonal  matrix  factor,  D of 

the  approximate  Hessian.  All  elements  of  HD  must  be  preset  positive  when 
MODE  =3  or  4. 

NN  Must  be  set  to  N*(N  - 1 ) / 2 . 

HL  A REAL  array  of  sixe  NN  used  to  hold  the  unit  lower  triangular  (by  rows) 
matrix  factor,  L of  the  approximate  Hessian.  HL  must  be  preset  when 
MODE  = 3 or  4. 


NW 

W 


HH 


Must  be  set  to  6*N  . 


A REAL  array  of  size  NW  used  as  working  space.  On  exit,  W is  partitioned 
as  follows: 


W(l)  - W(2N) 

W(2N  + ])  - W(3N) 
W(3N  + 1)  - W(4N) 
W(4N  + 1)  - W(5N) 

W(5N  + 1)  - W(6N) 


not  useful 

holds  -B  *g  (zero  for  the  bound  variables) 
holds  diag  (B)  (scaled)  computed  from  L and  D . 
holds  diag  (B)  (scaled)  computed  by  finite  differences 
(if  used) . 

holds  g for  all  variables. 


A REAL  variable  containing  the  scaled  step  size  for  finite  differencing. 

It  is  important  that  the  variables  are  well  scaled  (by  XS  ) because  HH 

will  be  applied  to  them  all.  The  user  should  examine  a range  of  values 
-3  -4  -5 

for  HH  (say  10  ,10  , 10  ) as  suggested  in  section  4.  HH  is  not 

used  when  subroutine  DERIV  is  available. 


TOLX  A REAL  variable  expressing  the  scaled  accuracy  required  in  the  variables. 
UNCMIN  attempts  to  satisfy  max| (B  *g)jJ  < TOLX  . When  using  finite 
difference  derivatives  there  is  little  point  in  requesting  TOLX  « HH  . 

TOLG  A REAL  variable  imposing  a tolerance  on  g . UNCMIN  attempts  to  satisfy 
,T  2 “ 

g g < TOLG  . Again,  setting  TOLG  « HH  is  probably  rather  ambitious. 
With  analytic  derivatives  HH  is  not  needed  and  smaller  values  of  TOLX 
and  TOLG  can  be  contemplated  (say  ~10  **) . 

MAXFN  Maximum  number  of  calls  to  CALCF  permitted. 

IPRINT  Print  control  parameter.  UNCMIN  outputs  to  channel  2 every  IPRINT 

iterations.  IPRINT  = 0 suppresses  all  output.  Various  messages  will 
appear  (see  under  Output  below)  when  IPRINT  ^0  to  indicate  progress  or 
difficulties.  The  usual  setting  is  expected  to  be  IPRINT  - 1 but  if  the 
user  only  wants  initial  and  final  values  to  be  output,  IPRINT  * 1000  is 
suggested. 
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ETA  A REAL  variable  specifying  the  line  search  accuracy  (0  < n < 1). 

ETA  = 0.01  is  recommended  initially  but  many  problems  will  be  solved  more 
efficiently  by  adopting  a much  larger  value  (up  to  say  n = 0.9). 


MODE  Indicates  which  quantities  are  being  supplied  to  M21UNC. 

MODE  = 0 Only  a point  X is  provided.  UNCMIN  will  begin  by  computing 
F and  G but  will  not  use  equation  (4.8)  to  define  the 
elements  of  HD  . Instead  HD(K)  = 1 , K = 1 ,N  . This  will  be 
the  normal  setting  with  the  analytic  derivative  version  of  the 
program. 

= 1 As  MODE  = 0 except  that  equation  (4.8)  is  employed  to  set  up 
HD  . This  will  be  the  most  common  setting  with  the  finite 
difference  program  and,  of  course,  cannot  be  used  with  analytic 
derivatives. 

= 2 F(X)  and  G(X)  are  provided  as  well.  The  contents  of  G for 
variables  initialised  on  bounds  are  irrelevant. 


= 3 X,  HD  and  HL  are  supplied.  This  is  the  normal  setting  for 
sensitivity  analysis  where  HD  and  HL  will  have  been  taken 
from  a previous  run. 

=4  X,  F,  G,  JBD,  HD  and  HL  are  supplied.  This  option  is  really 

intended  for  constrained  minimisation  by  a sequential  penalty 

2 

function  method  (eg  M2 1 IFF  see  Purcell  ) but  is  included  here  for 
completeness.  On  exit,  MODE  contains  the  number  of  function 
evaluation  performed. 

IEXIT  Indicates  the  circumstances  in  which  UNCMIN  was  left. 


IEXIT  = 1 Normal  exit.  Convergence  criteria  satisfied. 

= 2 Should  not  happen  unless  in  MODE  =3  or  4 the  user  supplies 
a non-positive  definite  matrix  (ie  HD  contains  a negative 
element) . 

= 3 MAXFN  function  calls  reached. 

= 4 No  further  progress  possible  (see  under  Output  below).  Normally 
means  that  the  limiting  accuracy  has  been  reached  and  that  to 
attain  the  desired  accuracy  the  calculation  of  F will  have  to 
be  improved  to  make  F(x)  smoother.  Alternatively  it  may  be 
that  a more  accurate  line  search  or  changes  to  HH  and/or  XS 
will  be  beneficial. 
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= 5 Should  not  occur.  It  would  mean  that  something  has  gone  wrong  in 
the  matrix  updating  routine  MODCHL. 

C. 3 User  subroutines 

The  calculation  of  F(x)  must  be  carried  out  by  a subroutine  headed 

SUBROUTINE  CALCF  (N,X,F,IC) 

DIMENS.  IN  X(N) 

On  entry*  IC  indicates  whether  F is  required  for  the  line  search  (IC  = 1)  or 

3F 

for  the  estimation  of  — by  differencing.  On  exit  N and  X must  be 

3x(IC-3) 

unchanged  but  the  user  can  set  IC  = 2 if  F cannot  be  calculated  providing  it 
is  a gradient  call  (IC  > 4).  Returning  IC  = 2 to  the  line  search  will  have 
unpredictable  consequences.  The  assumption  therefore,  is  that  F can  be 
defined  everywhere  within  the  specified  bounds  on  the  variables  but  may  not  be 
calculable  outside  (see  the  end  of  section  5.2). 

If,  on  the  other  hand,  analytic  derivatives  are  available,  CALCF  can  only 
be  called  with  IC  = ] and  the  user  must  supply  subroutine  DERIV  to  evaluate  G. 

SUBROUTINE  DERIV  (N,X,F,G, JBD.IC) 

DIMENSION  X(N) ,G(N) ,JBD(N) 


N,  X and  F should  not  be  altered.  F is  included  in  the  arguments  in  case 
it  shortens  the  calculation  of  derivatives  IC  and  JBD  appear  in  the  arguments 
to  enable  DERIV  to  economise  on  computation. 


IC  = 0 


IC  = 1 


Oniy  - 


free 


are  required.  The  free  variables  are  identified 
by  JBD(K)  = 0 . 


Only 


3F 

3x 


bound 


are  required.  The  bound  variables  are  identified 
by  JBD(K)  = ± 1 . 


If  it  is  more  convenient  to  calculate  all  derivatives  together  then, 
when  called  with  IC  = 0 , subroutine  DERIV  should  return  IC  = - 1 to  tell 
CALCG  to  retain  all  elements  of  G and  that  a subsequent  call  to  DERIV  with 
IC  = 1 would  be  unnecessary.  For  the  large  majority  of  iterations,  derivatives 
with  respect  to  variables  on  bounds  will  be  ignored. 


C.4  COMMON  areas:  None. 
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C.5  Other  subroutines 

The  RAE  subroutines  are  M21UNC,  UNCMIN,  RESET,  HESINV,  CALCG  and  CALFUN. 
M21UNC  is  responsible  for  scaling  the  variables  for  UNCMIN  and  for  unsealing 
quantities  on  exit.  RESET  modifies  the  matrices  L and  D whenever  a bound  is 
encountered.  HESINV  calculates  B * and  therefore  an  extra  JN(N  + 1)  of  REAL 
array  space  is  needed  if  B 1 is  specifically  required  by  the  user.  In  the 
controlling  segment, 

DIMENSION  BINV  (325)  ( ie  N $ 25) 

NN  = N*(N  - l)/2 

NW  = 6*N 

CALL  M21UNC  ( ) 

NH  = NN  + N 

CALL  HESINV  (N,HD,NN,HL,NH,BINV,NW,W)  . 

Note  that  HESINV  requires  2N  REAL  locations  for  working  space  and  that  the  array 
W may  conveniently  be  employed  because  M21UNC  does  not  return  anything  of 
significance  in  its  first  2N  elements. 

CALCG  works  out  the  required  partial  derivatives  either  by  calling  the 
user's  DERIV  (.analytic  expressions)  or  by  differencing.  In  the  latter  case, 
depending  on  how  it  is  called  by  UNCMIN,  CALCG  does  one  of  three  things. 

(a)  Evaluates  3F/3x  for  the  free  variables  (using  (4-1)  or  (4-2)  as 
appropriate) . 

(b)  Evaluates  3F/3x  for  the  bound  variables  (by  (4-2))  until  one  is 
found  which  satisfies  (5-2) . 

(c)  Takes  backward  differences  to  convert  forward  differences  to  central 
differences  for  the  free  variables. 

CALFUN  interfaces  between  UNCMIN  or  LINSCH  and  the  user's  CALCF.  The 
function  call  count  is  incremented  by  one  every  time  CALACF  is  called  or  by  n' 
every  time  DERIV  is  called  where  n'  = n,  n^  or  n - n^  depending  on  the  call 
and  return  values  of  IC  . n^  is  the  number  of  free  variables. 


In  addition,  the  following  NPL  routines  are  called 
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LINSCH  To  perforin  the  one-dimensional  (line)  searches. 

LDLTSL  To  solve  Bu  = v for  u . 

. (i ) ~~  / 

MODCHL  To  add  a rank-]  matrix  B -*•  B (formula  (3- 39)  is  actually  implemented; 

iT ' 


B = B 


(i)  + sf) 

p v 


NMDCHL  To  subtract  a rank-1  matrix  B ■+  B 


(i+l) 


^again  wi 


’ • -8)  ' 


e. 

/g 


(i+l)  ~ ggT 

th  (3~39) ; B = B + and  the  denominator  is  necessarily 


negative  because  Bd 

MDCOND  To  adjust  the  elements  of  HD  when  its  condition  number  becomes  too  large. 
C.6  Output 

All  printing  is  controlled  by  the  parameter  IPRINT  and  goes  to  channel  2. 


IPRINT  =0  No  output 

IPRINT  > 0 The  following  messages  can  appear 

(a)  ENTRY  TO  UNCMIN 

(b)  CONVERGENCE  CRITERION  IS  NOW:  GTG  < XXXX 

(c)  ITERATION  XX  HESSIAN  HAS  BEEN  MODIFIED  BECAUSE  COND(D)  - XXXX 

(d)  ITERATION  XX  FAILURE  X (should  only  get  'X'  = 4 ) 

(e)  TRY  AGAIN  WITH  CENTRAL  DIFFERENCE  DERIVS 

(f)  VARIABLE  XX  COULD  BE  FREED 

(g)  ENTER  LOCAL  SEARCH  PROCEDURE 

(a)  is  printed  immediately  UNCMIN  is  entered; 

(b)  indicates  that  TOLG  has  been  reduced  by  UNCMIN  because  the  point  supplied 

T 2 

already  satisfied  g g < (TOLG) 

(c)  HD  has  become  ill-conditioned  ('XXXX'  > 1.0E7); 

(d)  line  search  failure.  As  can  be  seen  from  the  flow  chart  (Fig  2)  four 
courses  of  action  are  then  possible; 

(i)  if  forward  differences  are  presently  in  use  switch  to  central  differences 
and  try  again  (message  (e) : can  only  occur  if  finite  differences  are  being 
employed,  of  course); 

(ii)  work  out  derivatives  for  bound  variables  looking  for  one  satisfying  (5-2). 
If  one  is  found,  message  (f)  is  output; 

(iii)  perform  a local  search  (message  (g)); 
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(iv)  when  actions  (i)  to  (iii)  have  been  tried  in  that  order  without  success, 
UNCMIN  terminates  with  IEXIT  = 4. 

The  summary  which  UNCMIN  provides  before  exit  should  be  self  explanatory 
(see  sample  output.  Fig  3).  The  use  of  it  is  described  in  section  8. 

In  addition,  the  following  quantities  are  output  every  IPRINT  iterations 
during  a minimisation 

IT  NF  iteration  number  (i) ; number  of  function  evaluations  so  far. 

F the  objective  function  F(x^)  . 

X(K),  K = 1 ,N  the  best  (lowest)  point  so  far  found. 

G(K),  K = 1,N  partial  derivatives  (8F/3x) ^ . A zero  value  will  normally  be 

output  for  a variable  on  a bound. 

UNCMIN  is  the  only  subroutine  capable  of  output. 

C. 7 General 

(a)  M21UNC  is  more  modern  and  sophisticated  than  M0402.  We  believe  it  to 
be  more  efficient,  more  informative  and  equally  robust. 

(b)  Two  separate  programs  are  provided  for  numeric  and  analytic  derivatives. 

(c)  There  is  no  restriction  on  problem  size  because  no  arrays  are 

explicitly  dimensioned.  In  general  one  might  expect  NFE,  the  total  number 

2 

of  calls  to  CALCF  required  to  increase  at  least  as  fast  as  n 

(d)  For  sensitivity  analysis  see  section  8. 

(e)  M21UNC  must  not  be  supplied  to  users  outside  RAE  because  of  the  NPL 
subroutines  within  it  and  the  conditions  under  which  those  routines  were 
acquired. 
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a(l),b 

c(i>,e 

d(i> 

i(i) 

DELTA 

E 

ETA 

Ef,c 

F 

FS 


(i) 

(i) 


S 


(i) 


LIST  OF  SYMBOLS 

scalars  involved  in  H updating 

the  ith  approximation  to  the  second  derivative  (Hessian)  matrix 
scalars  involved  in  B updating 
search  direction  (=  - H^g^) 
the  diagonal  matrix  factor  of  B^ 

Ax 

'accuracy'  parameter  for  M0402 

n 

error  in  the  calculation  of  g by  forward  (central)  differences 
the  objective  function  (=  F(x)) 

(positive)  scale  factor  for  F which  the  user  should  incorporate 
in  his  subroutine  CALCF 

the  vector  of  partial  dei  natives  9F/9x  at  the  start  of  the  ith 
iteration 


n 

NFE 


TOLG 

TOLX 


u,  v 


x* 


X 

XL 

XS 


a 

m 


the  true  second  derivative  matrix  (=  G(x) 

the  ith  approximation  to  the  inverse  Hessian  matrix 

the  unit  diagonal  matrix 

the  unit  lower  triangular  matrix  factor  of  B^ 
the  number  of  optimisation  variables 

the  number  of  function  evaluations  (calls  to  CALCF)  required  to 
reach  a minimum 


x'1*1’  - X(i> 

6(i*'>  - s(i> 


desired  maximum  value  of  | |g|  at  x (M21UNC  only) 

desired  accuracy  of  the  solution  point  (I  lx  “ x*  | | ) (M21UNC  only) 

occasional  vectors 

vectors  involved  in  B updating 

vector  of  optimisation  variables  at  the  start  of  the  ith  iteration 
a true  minimum  point  (g  = 0) 

the  solution  found  by  the  algorithm  (hence  F,  B , g etc) 
lower  bounds  on  x 

„(i) 


I /xsk  ~ 1 


scale  factors  chosen  so  that 

■ K 

upper  bounds  on  x 
vectors  involved  in  H updating 
the  line  search  parameter  (thus  p 
maximum  value  of  permitted  by  XL  and  XU 


(i) 


ad) 
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LIST  OF  SYMBOLS  (concluded) 


a 


1 


Ax 


e 

e 

m 


n 


minimum  value  of  allowed  when  using  forward  (M21UNC), 

central  (M21UNC),  central  (M0402)  difference  derivatives 

scaled  differencing  interval  applied  to  every  variable 

rounding  error  in  the  calculation  of  F(x) 

single  length  floating  point  precision  of  the  computer 

rounding  (truncation)  error  involved  in  calculating  g 

by  forward  (central)  differences 

line  search  accuracy  parameter;  0 < n < 1 (M21UNC  only) 
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Fig  1 


Fig  1 Flowchart  for  M0402 
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Fig  2 Flowchart  for  M21UNC 
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Fig  2 Flowchart  for  M21UNC  (continued) 


Output  from  UNCMIN 


The  output  shown  below  is  appropriate  to  the  example  discussed  in 
section  5.2  ie 

Minimise  F(x)  = 100^  - x2j2  + (]  - Xj)2 

subject  to 

*2  ^ 0.2  . 

The  following  quantities  were  specified  in  the  data 

X(l)  =-1.2  ; XL(  1 ) =~2.0  ; XU(1)  = 7.0  ; XS(1)  = 1.0 
X(2)  = 1.0  ; XL(2)  * 0.2  ; XU(2)  = 7.0  ; XS(2)  = 1.0 

MODE  = 0 : begin  with  = I 

IPRINT  = 1000  : output  only  at  the  starting  point  and  just  prior  to  exit 
DELTA  = Ax  = 0.00001  : finite  differencing  interval 
TOLG  = 0.00001  ; limiting  value  of  (g^g)^ 

TOLX  = 0.00001  ; required  accuracy  in  x 
ETA  = n = 0.01  ; fairly  accurate  line  searches 


3NTXT  TO  UKCNIN 

o * 

2.4200000?  01 

•1.2000000?  00  1.0000000?  00 
•2 . 1 960007?  02  >0.0000030?  01 

?X!T  ON  ITCMTION  6 Nf?  • 37  I3XIT  • 1 «TC  ■ 0.47033-12 

IOW2ST  FUNCTION  VAlU?  ■ 2.0074309?  00 


I 

VAR1A0L3S  « 

000 

3STINATC0 

ACC.  OF  X 

« (■  0F/0X) 

3STINATCS 

OIFF3NCNC3I 

0?  023/0X2 

01 A0  ( 10LT) 

0IAC  (NO) 

1 

•4. 20101303-01 

0 

2. 04023333-00 

•2. 01043-00 

1.4203?  02 

1.42000  02 

1.4209?  02 

2 

2.00000003-01 

-1 

0.0000000?  00 

3.3330?  00 

1.0094?  02 

1.0000?  00 

1.0000?  00 

1/0/feS  * 
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